bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
809 stars 38 forks source link

Issue while compiling signalbackup-tools under Fedora Live 30 (Hardware) #4

Closed elbrutalo closed 4 years ago

elbrutalo commented 5 years ago

Hello, when compiling under Fedora Live 30 I'm getting the following error:


LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=4 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools"
lto-wrapper: schwerwiegender Fehler: execvp: No such file or directory
Kompilierung beendet.
/usr/bin/ld: error: lto-wrapper failed
collect2: Fehler: ld gab 1 als Ende-Status zurück

I did the following steps on a Fedora30 on hardware:

$ git clone https://github.com/bepaald/signalbackup-tools.git $ sudo dnf install gcc-g++ cryptopp-devel sqlite-devel $ cd signalbackup-tools && chmod +x BUILDSCRIPT.sh $ sh BUILDSCRIPT.sh

Can anyone help? My linux skills are limited.

bepaald commented 5 years ago

Thanks. Is this running in a VM? I can reproduce this in VM, and just pushed a fix for that. Could you try the same commands again, and let me know?

If you are not running in a virtual machine, I don't know what's happening, but you might be able to fix the build by just changing line 7 of the script to NUMCPU=1

elbrutalo commented 5 years ago

thank you for your quick reply. Fedora Live does not run in a VM. I didn't install it to the Harddrive but booting it from a USB Stick.

Unfortunately, the error also occurs with the fix. The adjustment of the code in line 7 leads to the same error message.

Can I create a more detailed error log to better narrow down the error?

Error with modified line 7:

LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=4 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools" lto-wrapper: schwerwiegender Fehler: execvp: No such file or directory Kompilierung beendet. /usr/bin/ld: error: lto-wrapper failed collect2: Fehler: ld gab 1 als Ende-Status zurück [liveuser@localhost-live signalbackup-tools]$

Error with line 7 unmodified ( NUMCPU=$(nproc) ):

LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=4 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools" lto-wrapper: schwerwiegender Fehler: execvp: No such file or directory Kompilierung beendet. /usr/bin/ld: error: lto-wrapper failed collect2: Fehler: ld gab 1 als Ende-Status zurück [liveuser@localhost-live signalbackup-tools]$

bepaald commented 5 years ago

thank you for your quick reply. Fedora Live does not run in a VM. I didn't install it to the Harddrive but booting it from a USB Stick.

Hm, that should work just fine, I just did the exact same thing here with a F30 Live usb, so it should work just the same.

I think the problem is with the -flto=4 part of the command, but if you set NUMCPU=1, then that should also change to -flto=1, are you sure you changed that line, and saved the script? (And of course don't update from git after manually editing the script, that would undo the changes).

If it still fails, maybe you could try not running the script at all and just run:

g++ -std=c++2a -Wall -Wextra -Wshadow -Wold-style-cast -Woverloaded-virtual -pedantic -fomit-frame-pointer -O3 -march=native -flto -s -o signalbackup-tools */*.cc *.cc -lcryptopp -lsqlite3

or, the same with the -flto removed:

g++ -std=c++2a -Wall -Wextra -Wshadow -Wold-style-cast -Woverloaded-virtual -pedantic -fomit-frame-pointer -O3 -march=native -s -o signalbackup-tools */*.cc *.cc -lcryptopp -lsqlite3
elbrutalo commented 5 years ago

thank you, that worked (compiling), but now the signalbackup-tools command doesn't start:

LINKING: /usr/bin/g++ -Wall -Wextra -Wl,-z,now -O3 -s -flto=1 "cryptbase/o/cryptbase.o" "cryptbase/o/getbackupkey.o" "cryptbase/o/getcipherandmac.o" "endframe/o/statics.o" "filedecryptor/o/getframe.o" "filedecryptor/o/initbackupframe.o" "filedecryptor/o/getattachment.o" "filedecryptor/o/filedecryptor.o" "arg/o/arg.o" "sharedprefframe/o/statics.o" "avatarframe/o/statics.o" "attachmentframe/o/statics.o" "fileencryptor/o/encryptattachment.o" "fileencryptor/o/encryptframe.o" "fileencryptor/o/fileencryptor.o" "sqlstatementframe/o/statics.o" "sqlstatementframe/o/buildstatement.o" "signalbackup/o/importthread.o" "signalbackup/o/buildsqlstatementframe.o" "signalbackup/o/croptothread.o" "signalbackup/o/exportbackup.o" "signalbackup/o/updatethreadsentries.o" "signalbackup/o/signalbackup.o" "o/main.o" "backupframe/o/init.o" "sqlitedb/o/sqlitedb.o" "databaseversionframe/o/statics.o" "headerframe/o/statics.o" "stickerframe/o/statics.o" -lcryptopp -lsqlite3 -o "signalbackup-tools" [liveuser@localhost-live signalbackup-tools]$ signalbackups-tools ../run/media/liveuser/UNTITLED/signal-2019-07-30-10-56-13.backup 55944 05382 40148 73738 38418 61176 --output ../run/media/liveuser/UNTITLED/signal-2019-09-04-10-56-13.backup --opassword 55944 05382 40148 73738 38418 61176 bash: signalbackups-tools: Befehl nicht gefunden...

bepaald commented 5 years ago

Good!

To run executables in linux they need to be in your path (which this one is not), or you need to specify the full location. Long story short, use ./signalbackup-tools (note the ./). Also, the passwords need to be one string, you can only put spaces in there if you quote it (ie: "55944 05382 40148 73738 38418 61176"), or use . or - instead, or only all the numbers without anything in between.

elbrutalo commented 5 years ago

I'm sorry for asking such dumb questions but I tried to solve this for 30 minutes now, it still won't start the executable: [liveuser@localhost-live signalbackup-tools]$ ./signalbackups-tools ../run/media/liveuser/UNTITLED/signal-2019-07-30-10-56-13.backup "55944 05382 40148 73738 38418 61176" --output ../run/media/liveuser/UNTITLED/signal-2019-09-04-10-56-13.backup --opassword "55944 05382 40148 73738 38418 61176" bash: ./signalbackups-tools: No such file or directory [liveuser@localhost-live signalbackup-tools]$ Did I get you wrong? Thank you so much for helping.

bepaald commented 5 years ago

Nope, I don't see anything wrong. If the build succeeded you should have the executable in the directory, can you see it when you type ls (lists the contents of the current directory)?

elbrutalo commented 5 years ago

i think it is there: signalbackup-tools and BUILDSCRIPT.sh are the only green elements between the rest of blue and white elements:

[liveuser@localhost-live signalbackup-tools]$ ls arg BUILDWINDOWS.sh framewithattachment signalbackup attachmentframe common_be.h headerframe signalbackup-tools avatarframe cryptbase LICENSE sqlitedb backupframe databaseversionframe main.cc sqlstatementframe base64 endframe o stickerframe basedecryptor filedecryptor README.md BUILDSCRIPT.sh fileencryptor sharedprefframe

Since the signal backup file is 4-5 GB I can't put the backup file within the signalbackup-tools folder (that's why I have to put the path to the volume/backup file in the command) but I can't execute the executable right now...

bepaald commented 5 years ago

Hm, I'm not sure what's going on then, as far as I can tell it should just work. I've made a little video of the process, right from the start of booting the Live image (it hangs for a bit while installing the packages, so it's slightly long, but maybe check it out, see where the difference is): https://send.firefox.com/download/e9706d671e830f1f/#tGohrdcjyzc0ezuc4i7lnw

It ends with an error btw, only because I don't supply any arguments to the program, this is expected.

elbrutalo commented 5 years ago

Thank you so much! It's working now. The tool is processing my corrupted backup to a new backup file.

Unfortunately, it does not seem to correct the error. When trying to import the new signal backup file in Signal the import process stops at the count of 67101 messages.

Is there anything in the syntax that can fix my Backup? The terminal's log of signalbackup-tool was this

Reading backup file... FRAME 66756 (099.2%)... STOPPING BEFORE END OF ATTACHMENT!!! done! Exporting backup to '/run/media/liveuser/signal/signal-neu.backup' Writing HeaderFrame... Writing DatabaseVersionFrame... Writing SqlStatementFrame(s)... Dealing with table 'sms'... 59436/59436 entries...done Dealing with table 'mms'... 2410/2410 entries...done Dealing with table 'part'... 70/2439 entries...Warning: attachment data not found Dealing with table 'part'... 345/2439 entries...Warning: attachment data not found Dealing with table 'part'... 562/2439 entries...Warning: attachment data not found Dealing with table 'part'... 583/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1504/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1505/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1506/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1507/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1508/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1509/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1510/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1511/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1512/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1513/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1514/2439 entries...Warning: attachment data not found Dealing with table 'part'... 1872/2439 entries...Warning: attachment data not found Dealing with table 'part'... 2439/2439 entries...Warning: attachment data not found done Dealing with table 'thread'... 0/0 entries... Dealing with table 'identities'... 0/0 entries... Dealing with table 'drafts'... 0/0 entries... Dealing with table 'push'... 0/0 entries... Dealing with table 'groups'... 0/0 entries... Dealing with table 'recipient_preferences'... 0/0 entries... Dealing with table 'group_receipts'... 0/0 entries... Dealing with table 'job_spec'... 0/0 entries... Dealing with table 'constraint_spec'... 0/0 entries... Dealing with table 'dependency_spec'... 0/0 entries... Dealing with table 'sticker'... 0/0 entries... Writing SharedPrefFrame(s)... Writing EndFrame... Error: EndFrame not found

bepaald commented 5 years ago

Hm.. it looks very much like your backup file is not corrupted, but simply incomplete! That is quite a big problem, since the end of the backup file contains important data for the backup. (There is a small possibility of corruption though, if the size of that last attachment is incorrect it might try to read past the end of the file.)

I need to think about a way to verify what's going on and how to fix it. Obviously, the missing data is simply gone, but I might be able to at least get the data that is still there imported. I'm assuming this is somewhat important to you because it is going to be a somewhat complicated procedure (if it works at all), so be prepared for some complicated instructions. Also it will probably take a while for me to think up possible solutions.

elbrutalo commented 5 years ago

Thank you so much for your help! Yes, for personal reasons it is very important for me to restore as many conversations as possible from this backup so that I accept every effort for it. If you find a way to restore the conversations up to the EOF I would be happy to pay you for the time you invested, I appreciate the effort!

If the backup was aborted prematurely or if it is a corrupt attachment, I can't tell. I think that the backup process ran smoothly and there was enough space on the device.

bepaald commented 5 years ago

I will be working on this, I have some ideas, but first:

I can't find it in this thread (maybe you deleted it?), but in my mailbox I have a message from you where you say you copied the backup file to a FAT32 formatted usb stick? Is this true and still the case? Earlier you also said the backup file was 4-5GB?

Files on a FAT32 filesystem have a maximum size of exactly 4GB, if you copy a larger file onto it it will be truncated to 4GB. Can you check the filesizes? Do you still have the original? I suggest trying to format the usb storage to a more modern filesystem (NTFS should work out of the box on both linux and windows).

elbrutalo commented 5 years ago

Dear bepaald, thank you for your input. It's correct that I copied the backup file to a FAT32 formatted USB-drive to access it from witin Fedora Live. I checked the file size, it is identical to the original on my harddisk (4.039.229.440 Bytes).

Since there was no error message when copying to the usb-stick, this seems to be still within the FAT32 limit.

I am almost 100% sure that I transferred the signal backup directly from the phone to the computer via SmartSwitch. So I don't know why and when the file was cropped.

bepaald commented 5 years ago

Ok, in your message on the signal github you mentioned 4,7GB (https://github.com/signalapp/Signal-Android/issues/7637#issuecomment-527182118), so I figured that was way to big for FAT32 (maybe you meant 3,7GB, that's about 4039229440 bytes?).

Anyway, if you don't have any larger versions of the backup, it doesn't matter how it was cropped, I'll get working on a way to fix it. I've done some investigating and I have an idea to get the messages imported (it may take a little time though). Also from refreshing my memory by looking at the code, I'm pretty much certain there was no corruption but truncation (corruption would have resulted in a bad MAC before reaching eof).

elbrutalo commented 5 years ago

I had 4.7 GB in memory but the original backup file still exists and has 4,039,229,440 bytes (4.04 GB under macOS). I guess I remember it wrong because I copied the backup file directly from the smartphone to the computer without any detours (via FAT32).

If the file is cropped, not much can be missing and my hope would be that at least most of the conversations could still be recovered.

bepaald commented 5 years ago

If the file is cropped, not much can be missing and my hope would be that at least most of the conversations could still be recovered.

I think I can tell from the output you posted earlier it is probably only the last attachement that is missing. From the order in which the backup data are written, if I'm correct, you should have all messages (the text parts of the messages) including messages received after that last incomplete attachment. Unfortunately, there was also important stuff that was written after the attachments that is now gone.

I have just pushed a commit that tries to generate the most important tables from the information in the messages. It fills in data for the thread table (otherwise your list of conversations would appear empty, even though the messages are in the db) and the groups info. I was worried about the 'identities' table remaining empty, but from my testing the app seems to accept this, it will just fill in new data after restoring. Check out the current code and compile (no need to edit the buildscript anymore!), then run like this:

./signalbackup-tools --generatefromtruncated /path/to/truncatedbackupfile.backup 01234560123456012345601234560123456 --output [newfixedbackup] --opassword [newpassword]

I'm extremely tired, but some notes I can think of right now:

hm... that's all I can think of right now. I would love full output of the command above (censor anything you need, but I don't think it reveals much sensitive information). Also if you notice anything about your restored backup (missing or incorrect things) I would like to know. You might also want to test actually sending and receiving messages.

elbrutalo commented 5 years ago

dear beepald, thank you so much for your efforts! i was able to compile the new package but I can't start the command now:

[liveuser@localhost-live signalbackup-tools]$ ./signalbackup-tools --generatefromtruncated /run/media/liveuser/KASTI32/signal-2019-07-30-10-56-13.backup 559440538240148737383841861176 --output /run/media/liveuser/KASTI32/signal-neu --opassword 559440538240148737383841861176 bash: ./signalbackup-tools: No such file or directory within the folder "signalbackup-tools" the compiled script's name is now "signalbackup-tools2" - I tried adding the "2" to the command but that also results in an error:

[liveuser@localhost-live ~]$ ./signalbackup-tools2 --generatefromtruncated /run/media/liveuser/KASTI32/signal-2019-07-30-10-56-13.backup 559440538240148737383841861176 --output /run/media/liveuser/KASTI32/signal-neu --opassword 559440538240148737383841861176 bash: ./signalbackup-tools2: No such file or directory I followed the same process than the last time. No problems when compiling the package, no editing of the buildscript.sh neede.

Thanks again so much!

bepaald commented 5 years ago

Whoops, sorry that was a stupid mistake, I fixed it now. If you check the code out again the buildscript should be fixed.

However, just adding the "2" should have worked. This looks like the exact same problem you had earlier (https://github.com/bepaald/signalbackup-tools/issues/4#issuecomment-528030388), which I didn't understand either. How did you fix that one?

bepaald commented 5 years ago

Sorry, I just noticed: in your first command you were inside the signalbackup-tools directory ([liveuser@localhost-live signalbackup-tools]$), that only didn't work because I uploaded the wrong buildscript (which built signalbackup-tools2, with a "2" added).

However, in your second attempt (where you cleverly added the "2") you are in the wrong directory [liveuser@localhost-live ~], could that be the problem?

bepaald commented 5 years ago

@elbrutalo Any luck so far? I could make another video if you need one...

elbrutalo commented 5 years ago

Dear bepaald, please forgive my late feedback, I was on my way and only now I had the possibility to report back.

The recovery process went on until the end, most of the messages could be restored (especially the old messages were all there, at the end some days might have been missing).

Not restored were the attachments of the last weeks (photos, voice messages, videos). These appear as speech bubbles in the conversations, but are empty (see screenshots).

But I'm overjoyed about the older messages that could still be saved! Therefore I thank you infinitely for your help! I have also sent you via Paypal a small expense allowance and appreciate your commitment here for the community and me very much!

Unfortunately, I have now caused a new problem due to carelessness:

In the months since the broken backup I have been working with a new instance of Signal and have received and sent several hundred messages. These messages exist in a second backup file with a different passphrase.

This means I now have the recovered backup file (with passphrase 1) and the new one from another signal instance (with passphrase 2).

Could I merge them with your tool? I found some threads on the net where people were facing the same problem. There doesn't seem to be a solution.

Maybe it is possible in my case because the conditions are favorable. The backup file from the new signal contains messages from the same contacts as in the old corrupt backup. So numbers and contact names are the same.

Is there a way to solve this with your script?

Best regards, elb

bepaald commented 5 years ago

Dear bepaald, please forgive my late feedback, I was on my way and only now I had the possibility to report back.

No problem!

The recovery process went on until the end, most of the messages could be restored (especially the old messages were all there, at the end some days might have been missing).

Good! If I had to guess, I'd say all messages that were in the database were restored. I think I was pretty careful about that. Any messages that could not be placed in a thread should have produced some output stating that. For example, with my own (truncated) testing backup:

Creating threads from 'mms' table data
Creating threads from 'sms' table data
Thread for this conversation partner already exists. This may be a group with only two members and only incoming messages. This case is not supported.
  !!! WARNING !!! Unable to generate thread data for messages belonging to this thread (no outgoing messages in conversation)
----------------------------------
| union_thread_id | address      |
----------------------------------
| 3               | +3164XXXXXXX |
----------------------------------

Not restored were the attachments of the last weeks (photos, voice messages, videos). These appear as speech bubbles in the conversations, but are empty (see screenshots).

I don't see any screenshots :) But that's okay, in my testing backup I also had one attachment missing and it also showed an empty bubble. I don't think they will pose a problem, but if the message body is empty (it was just an attachment with no actual text message) you might as well delete the messages to be on the safe side.

I think any missing attachments were imply not present in the backup file anymore, but you could test this. The program has had the ability to dump the entire decrypted database to a folder for a while now. You could use this option to see if there were any attachments in the database that haven't been placed in the fixed backup (I would guess not, but it's possible if the message they belong to is gone). To do this:


[~/programming/signalbackup-tools] $ mkdir RAWOUTPUT
[~/programming/signalbackup-tools] $ ./signalbackup-tools DEVsignal-2019-09-05-09-18-22.TRUNCATED 005708563826394701887625524302 --output RAWOUTPUT/
signalbackup-tools build 20190913.093016
IV: (hex:) 78 fe 4a 10 eb 1a 55 e0 b8 7c 85 6e cc b0 da f4 (size: 16)
SALT: (hex:) 4b b9 3c 58 dd 29 85 e3 4d 38 d3 78 d5 83 50 ef fb b5 0b c7 dd 02 e5 c8 5a ad d4 04 ff 56 fc e2 (size: 32)
BACKUPKEY: (hex:) fe 95 73 d2 58 80 4f d5 68 80 56 b0 94 9c e0 40 bb f7 be b4 4c 35 9f 91 09 26 7f 8b 54 ef 88 16 (size: 32)
CIPHERKEY: (hex:) b8 fa 66 66 2d aa 37 0a 90 a9 26 cf 41 ab 38 35 c8 ed df 8c 15 23 f2 07 28 36 a4 59 ae 58 f8 49 (size: 32)
MACKEY: (hex:) 3f 63 69 36 1a ed d2 5f 30 b5 31 93 65 cf 0f 24 b1 9b a1 8c f5 45 ae e8 e1 0b ff ff 70 36 8d 97 (size: 32)
COUNTER: 2029931024
Reading backup file...
FRAME 341 (099.2%)...  STOPPING BEFORE END OF ATTACHMENT!!! (EOF) 
Failed to get attachment data for FrameWithAttachment... info:
Frame number: 342
        Type: ATTACHMENT
         - row id          : 98 (8 bytes)
         - attachment id   : 1567667827993 (8 bytes)
         - length          : 1588447 (8 bytes)
done!
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing Attachments...
Writing Avatars...
Writing SharedPrefFrame(s)...
Writing StickerFrames...
Writing EndFrame...
Error: asked to write nullptr frame to disk
Writing database...

It will obviously fail at the end, because the database is no good, but it will write the attachment data to the directory as it finds them. The filenames will not be very helpful, but you can manually inspect the attachment files (they should also be pretty much chronologically stored in the backup, so the attachment with the highest number should be the most recent one).

But I'm overjoyed about the older messages that could still be saved! Therefore I thank you infinitely for your help! I have also sent you via Paypal a small expense allowance and appreciate your commitment here for the community and me very much!

Thanks a lot! I just noticed that this morning and did not know who it was from. Though of course I would have helped you anyway (or tried to at least), I really do appreciate it a lot!

Unfortunately, I have now caused a new problem due to carelessness:

* the new instance of Signal was set to keep only 500 messages per conversation

* i.e. signal imports the old data, but then deletes them again immediately.

In the months since the broken backup I have been working with a new instance of Signal and have received and sent several hundred messages. These messages exist in a second backup file with a different passphrase.

This means I now have the recovered backup file (with passphrase 1) and the new one from another signal instance (with passphrase 2).

Could I merge them with your tool? I found some threads on the net where people were facing the same problem. There doesn't seem to be a solution.

Maybe it is possible in my case because the conditions are favorable. The backup file from the new signal contains messages from the same contacts as in the old corrupt backup. So numbers and contact names are the same.

Is there a way to solve this with your script?

Haha, wow, that is some bad luck! But also good news! I have already implemented this feature a couple of weeks ago! In fact, when I woke up this morning I just decided to post a message in this thread to let those people know they could test it out, but now I will let you be the brave tester.

I have only tested with a few handcrafted, very small backups, you are really the first to try it seriously. Be prepared for it not to work (at least not the first time), it may need more work. Also, it is not a fully automated procedure, there are some slightly more complicated instructions than before. Example:

Assuming a current backup current.backup and an old one source.backup. First you need to get a list of threads from the old backup:

[~/programming/signalbackup-tools] $ ./signalbackup-tools --listthreads source.backup 871668681636341580140408145422 
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
----------------------------------------------------------------------------------------------------------
| _id | recipient_ids                  | snippet                        | COALESCE(recipient_prefer[...] |
----------------------------------------------------------------------------------------------------------
| 1   | +161765XXXXX                   | <#> Your Signal verificat[...] | (NULL)                         |
| 2   | __textsecure_group__!a2c3[...] | Ok                             | devgroup                       |
| 3   | +316474XXXXX                   | Last msg                       | Master Phone                   |
| 4   | +316836XXXXX                   | Ok                             | Devphone Red                   |
----------------------------------------------------------------------------------------------------------

Then, you can import a selection of threads from the source file into your current backup and export to a new backup file. Expect tons of output (I really need to clean that up sometime):

[~/programming/signalbackup-tools] $ ./signalbackup-tools --importthreads 2,3,4 --source source.backup --sourcepassword 871668681636341580140408145422 --output merged.backup --opassword 000000000000000000000000000000 current.backup 420676745407910020904427069666
IV: (hex:) e2 dd c7 b0 d7 c1 81 01 6b db f8 24 47 98 5c 35 (size: 16)
SALT: (hex:) 47 a8 83 be 1f 9f d7 a4 db 6c 82 bd c4 d2 e9 4b 5e 90 d7 fd a4 98 81 4a 62 f1 0e d6 e5 52 f7 ee (size: 32)
BACKUPKEY: (hex:) b1 59 c2 ec ce cf dc de 37 6f bd af 15 79 06 c7 30 c4 56 3f 5f 60 f8 74 67 34 90 7b a5 c5 44 2b (size: 32)
CIPHERKEY: (hex:) 69 12 52 65 c2 5d 96 e0 26 dc 46 6c 95 92 18 f0 e7 69 31 7d 07 f7 ce 7e 4d 74 76 10 d1 78 da d8 (size: 32)
MACKEY: (hex:) 37 4c bc 18 c2 93 47 60 67 63 0d 81 24 65 9e ab 55 a3 c6 17 fb 95 26 2e 4a 68 e8 aa 5c a5 0b 7e (size: 32)
COUNTER: 3806185392
Reading backup file...
FRAME 88 (100.0%)... Read entire backup file...
done!
Importing thread 2 from source file: source.backup
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
Deleting messages not belonging to requested thread(s) from 'sms'
Deleting messages not belonging to requested thread(s) from 'mms'
Deleting attachment entries from 'part' not belonging to remaining mms entries
Deleting other threads from 'thread'...
Dealing with thread id: 2
  Updating msgcount
  Setting last msg date
  Updating snippet
  Updating snippet type
Deleting removed groups...
Delete others from 'identities'
Deleting group receipts entries from deleted messages...
Deleting drafts from deleted threads...
Adjusting indexes in tables...
Compacting table: sms
Compacting table: mms
Compacting table: part
Compacting table: recipient_preferences
Compacting table: groups
Compacting table: identities
Compacting table: group_receipts
Compacting table: drafts
Found existing thread for this recipient in target database, merging into thread 5
Importing statements from source table 'sms'...4 entries...
Importing statements from source table 'mms'...3 entries...
Importing statements from source table 'part'...0 entries...
Importing statements from source table 'drafts'...0 entries...
Importing statements from source table 'push'...0 entries...
Importing statements from source table 'group_receipts'...6 entries...
Importing statements from source table 'sticker'...0 entries...
Importing statements from source table 'job_spec'...0 entries...
Importing statements from source table 'constraint_spec'...0 entries...
Importing statements from source table 'dependency_spec'...0 entries...
Importing thread 3 from source file: source.backup
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
Deleting messages not belonging to requested thread(s) from 'sms'
Deleting messages not belonging to requested thread(s) from 'mms'
Deleting attachment entries from 'part' not belonging to remaining mms entries
Deleting other threads from 'thread'...
Dealing with thread id: 3
  Updating msgcount
  Setting last msg date
  Updating snippet
  Updating snippet type
Deleting removed groups...
Delete others from 'identities'
Deleting group receipts entries from deleted messages...
Deleting drafts from deleted threads...
Adjusting indexes in tables...
Compacting table: sms
Compacting table: mms
Compacting table: part
Compacting table: recipient_preferences
Compacting table: groups
Compacting table: identities
Compacting table: group_receipts
Compacting table: drafts
Found existing thread for this recipient in target database, merging into thread 6
Importing statements from source table 'sms'...5 entries...
Importing statements from source table 'mms'...1 entries...
Importing statements from source table 'part'...1 entries...
Importing statements from source table 'drafts'...0 entries...
Importing statements from source table 'push'...0 entries...
Importing statements from source table 'group_receipts'...0 entries...
Importing statements from source table 'sticker'...0 entries...
Importing statements from source table 'job_spec'...0 entries...
Importing statements from source table 'constraint_spec'...0 entries...
Importing statements from source table 'dependency_spec'...0 entries...
Importing thread 4 from source file: source.backup
IV: (hex:) e3 1c c6 5c 46 e3 b6 62 ff a7 95 40 e6 99 f9 eb (size: 16)
SALT: (hex:) 35 6a 55 bc 80 e9 da dd 2c ed 28 e2 da 22 f4 b8 3b e3 5d 44 2b 94 18 3e d0 6a 6f e3 79 d9 5d 1f (size: 32)
BACKUPKEY: (hex:) 3e 91 7a 80 66 9f 6b 1f a9 77 b1 3e fb 10 ae 15 b2 b4 69 7f 99 ad 40 16 57 61 00 ce cc 8e d0 cc (size: 32)
CIPHERKEY: (hex:) 69 67 67 1b d7 51 41 84 33 88 6b 1e 3b 21 e9 96 6e f7 f0 0d 61 ec 64 6e 44 29 97 3e df 49 83 26 (size: 32)
MACKEY: (hex:) 0b 30 c9 d3 56 6b 8c 81 f3 f5 bb cf 1d b4 ed 57 66 dc c8 72 11 58 04 42 07 db b8 a1 61 62 2e c3 (size: 32)
COUNTER: 3810313820
Reading backup file...
FRAME 93 (100.0%)... Read entire backup file...
done!
Deleting messages not belonging to requested thread(s) from 'sms'
Deleting messages not belonging to requested thread(s) from 'mms'
Deleting attachment entries from 'part' not belonging to remaining mms entries
Deleting other threads from 'thread'...
Dealing with thread id: 4
  Updating msgcount
  Setting last msg date
  Updating snippet
  Updating snippet type
Deleting removed groups...
Delete others from 'identities'
Deleting group receipts entries from deleted messages...
Deleting drafts from deleted threads...
Adjusting indexes in tables...
Compacting table: sms
Compacting table: mms
Compacting table: part
Compacting table: recipient_preferences
Compacting table: groups
Compacting table: identities
Compacting table: group_receipts
Compacting table: drafts
Importing statements from source table 'sms'...4 entries...
Importing statements from source table 'mms'...0 entries...
Importing statements from source table 'part'...0 entries...
Importing statements from source table 'thread'...1 entries...
Importing statements from source table 'identities'...0 entries...
Importing statements from source table 'drafts'...0 entries...
Importing statements from source table 'push'...0 entries...
Importing statements from source table 'groups'...0 entries...
Importing statements from source table 'recipient_preferences'...0 entries...
Importing statements from source table 'group_receipts'...0 entries...
Importing statements from source table 'sticker'...0 entries...
Importing statements from source table 'job_spec'...0 entries...
Importing statements from source table 'constraint_spec'...0 entries...
Importing statements from source table 'dependency_spec'...0 entries...
Exporting backup to 'merged.backup'
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing SqlStatementFrame(s)...
  Dealing with table 'sms'... 25/25 entries...done
  Dealing with table 'mms'... 7/7 entries...done
  Dealing with table 'part'... 2/2 entries...done
  Dealing with table 'thread'... 7/7 entries...done
  Dealing with table 'identities'... 3/3 entries...done
  Dealing with table 'drafts'... 0/0 entries...
  Dealing with table 'push'... 0/0 entries...
  Dealing with table 'groups'... 1/1 entries...done
  Dealing with table 'recipient_preferences'... 4/4 entries...done
  Dealing with table 'group_receipts'... 8/8 entries...done
  Dealing with table 'sticker'... 0/0 entries...
  Dealing with table 'job_spec'... 1/1 entries...done
  Dealing with table 'constraint_spec'... 0/0 entries...
  Dealing with table 'dependency_spec'... 0/0 entries...
Writing SharedPrefFrame(s)...
Writing EndFrame...
Done!

The program automatically tries to determine into which thread of the current db the old messages should be inserted. This might fail if one of the backups has a contact with a country code (+316.....) and the other omits it (06....).

Please let me know if you need any more help in running this function, I'm not sure the above instructions are very clear. And of course, if you do manage to get it going I would love to hear the results.

Good luck!

elbrutalo commented 5 years ago

Dear beepald,

ah, great that this feature already exists!

I'll test it right now, but just to be on the safe side I'll ask the question first:

What exactly can go wrong if there are foreign contacts with an international prefix in one of the backups and not in the other? Does the whole process then stop?

In my case there are many threads with foreign phone numbers (partly only in the old backup, partly in the new and in the old one).

Is there anything else to consider before I start?

+41 / +43 / +44 / +17 / +35 / +33 / +21 / +39 etc. then there are still threads with counterparts where the sender identification is only a text and not a number (service numbers, chatbots, messages sent via various messengers).

bepaald commented 5 years ago

(some of the following is guesswork, as I said, the code is not extensively tested. Keep in mind that the input backups are opened read-only, so you really can't end up any worse than you start :) )

Well nothing can go wrong exactly, but signal internally uses the phone numbers as the contact id. It is by this id that this program matches the threads in the old and new databases. So nothing can go wrong, just if you have 0611111111 in one backup and +31611111111 in the other, the threads will not be identified as the same person and will not be merged, they will just turn into two separate threads. You could check this by running the tool with --listthreads (as in the example above): the column "recipient_ids" is the identifier by which threads are merged. Any threads whose recipient_ids is not found in the other backup will get their own, new thread in the output file. I think this is the natural way, if a contact of yours loses his phone and gets a new one (with a new number), those messages would also become a new thread (so you'd have two for the same contact) because his phone number (= recipient_ids) has changed.

then there are still threads with counterparts where the sender identification is only a text and not a number (service numbers, chatbots, messages sent via various messengers).

I have very little experience with this, but I kind of expect it to work even if it's not a number, as long as it's the same string in both databases they hopefully get merged.

Is there anything else to consider before I start?

I can't think of anything else. I just tested merging the fixed backup I had truncated for testing. It seems to have worked fine, even with the missing attachments. There was also a service number in there which also seemed to work. Now all the messages in it are doubled (because I merged it with itself). I think the best way to get answers is to try it! I'm very curious myself actually, so if you have the time and feel up to it, please try it out.

PS If you have a lot of threads, it might be tedious to write --importthreads 1,2,3,4,5,6,7,9,10,11,12,13 in the command line. The program will also accept ranges of thread id's, so the previous could be written as --importthreads 1-7,9-13.

bepaald commented 5 years ago

@elbrutalo Did you try it out yet?

Obviously, you do not have to if you don't want, but I just glanced over the code changes for the upcoming 4.48 version of Signal (currently in beta testing), and the changes will definitely break my current merging code. So, if you are going to try, please do so before the new version comes out and before updating (or wait until I've updated my code, but it could take quite a while).

elbrutalo commented 5 years ago

Dear beepald,

thanks for reminding me of the upcoming changes. I’ll try it tonight and let you know!

Thank you so much!

Patrick

Am 25.09.2019 um 20:41 schrieb bepaald notifications@github.com:

@elbrutalo https://github.com/elbrutalo Did you try it out yet?

Obviously, you do not have to if you don't want, but I just glanced over the code changes for the upcoming 4.48 version of Signal (currently in beta testing), and the changes will definitely break my current merging code. So, if you are going to try, please do so before the new version comes out and before updating (or wait until I've updated my code, but it could take quite a while).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bepaald/signalbackup-tools/issues/4?email_source=notifications&email_token=ANCCX44KFPWW7EPZ5UMUVX3QLOWFHA5CNFSM4ITQ5CJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7S5TMI#issuecomment-535157169, or mute the thread https://github.com/notifications/unsubscribe-auth/ANCCX44ETJIGFK42NK3HD6LQLOWFHANCNFSM4ITQ5CJA.

elbrutalo commented 5 years ago

Dear beepald, I tried to merge the two scripts (all threads and only single threads) but somehow after a short time the script freezes my Fedora. After that I can only do a hard reset.

The problem occurs both when I select all threads (1-587) or only some threads (e.g. 1-4).

Since the computer freezes I cannot send you the terminal log but instead only two screenshots of the frozen screen:

https://www.dropbox.com/s/ekheqaz1v3yaia2/20190926_121934.jpg?dl=0 https://www.dropbox.com/s/taiwtp662eoj66l/20190926_121931.jpg?dl=0 https://www.dropbox.com/s/o2qv4501met4d9e/20190926_114333.jpg?dl=0 https://www.dropbox.com/s/s6a4uv3qi51h9cn/20190926_114323.jpg?dl=0

Do you have any idea how I can avoid this problem? thank you so much.

bepaald commented 5 years ago

Hi,

Thanks for trying!

Not sure what's going on here, but I can imagine the machine is running out of memory. Do you think that is possible? Though I already had some low-memory options prepared, I never bothered to enable them. The merging code was very memory hungry (more than the other functions), and in combination with your huge backup file, I can imagine the machine is getting low on RAM.

I've spent the last couple of hours enabling the low-memory mode for the merging routine (I had not maintained it properly, so it took a little work) and testing if I didn't break any other functionality. For my tests (merging 10 threads from two 96MB files) memory usage went from 520MB to around 150MB (also, it should go a bit quicker).

Please try again with the current code, I hope it helps. If it still hangs, does it at least get further along?

EDIT: Thinking about it, the max RAM used is probably around the same as the total size of the two backups you are trying to merge (in the new version), so if you have less RAM than the size of your two backups combined you might still get in trouble.

EDIT2: I've further reduced memory usage, if RAM was the problem I don't think it can be anymore. Also, with the number of threads you are merging, output will be waaay to big to capture from the terminal, so if you append | tee OUTPUT to your command (so it will look like ./signalbackup-tools --importthreads 2,3,4 --source source.backup --sourcepassword 871668681636341580140408145422 --output merged.backup --opassword 000000000000000000000000000000 current.backup 420676745407910020904427069666 | tee OUTPUT), all the output of the program will be stored in a file called "OUTPUT"

EDIT3: Sorry for all the messages. I just found a stupid bug in the merging code, but I'm to tired to fix it now, so please wait a little while, I'll have it fixed tomorrow. (fixed)

elbrutalo commented 5 years ago

Hello beepald,

with the adjustments the script went through without freezing the system, thank you very much!

The import in the newly installed signal also worked. However, all messages BEFORE the 11.9.2019 are now available twice (no matter if text, picture or voice message).

As far as I could see all threads are affected.

It is noticeable that the merged backup file (merged.backup) has the added file sizes of the two output backups (source1.backup + source2.backup).

However, source2.backup still contains the last 500 messages of many threads that are still contained in source1.backup.

On the other hand, ALL messages before 11.9.2019 have now been imported twice in all threads.

Do you have any other idea why this might be?

Thank you very much already! Gradually we are getting closer to our goal!

Patrick

Old, repaired, formerly corrupt backup: source1.backup (4,006,050,921 bytes)

New backup of the last 500 messages of all threads: source2.backup (2,187,114,872 bytes)

Fusion backup: merged.backup (6.192.939.327 bytes)

The call was as follows:

./signalbackup-tools --importthreads 1-585 --source /run/media/liveuser/KASTI32/source1.backup --sourcepassword 559440538240148737383841861176 --output /run/media/liveuser/KASTI32/merged.backup --opassword 276748450922646926144222545568 /run/media/liveuser/KASTI32/source2.backup 276748450922646926926144222545568

Esokrates commented 5 years ago

@bepaald I couldn't get it to compile: https://pastebin.com/c1r4eTXg Would be great if you could help!

elbrutalo commented 5 years ago

Hi, just follow these steps on a Fedora Live with internet connection:

$ git clone https://github.com/bepaald/signalbackup-tools.git https://github.com/bepaald/signalbackup-tools.git $ sudo dnf install gcc-g++ cryptopp-devel sqlite-devel $ cd signalbackup-tools && chmod +x BUILDSCRIPT.sh $ sh BUILDSCRIPT.sh

Am 27.09.2019 um 15:53 schrieb Esokrates notifications@github.com:

@bepaald https://github.com/bepaald I couldn't get it to compile: https://pastebin.com/c1r4eTXg https://pastebin.com/c1r4eTXg Would be great if you could help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bepaald/signalbackup-tools/issues/4?email_source=notifications&email_token=ANCCX46HSSJL6W5R45VOMQLQLYF4PA5CNFSM4ITQ5CJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7Y64RQ#issuecomment-535948870, or mute the thread https://github.com/notifications/unsubscribe-auth/ANCCX43XKRI5E3OWNF6P32LQLYF4PANCNFSM4ITQ5CJA.

bepaald commented 5 years ago

@Esokrates The error looks like one I've seen before when people are using an old version of cryptopp. As written in the README: "Requirements: crypto++ (tested with 8.2.0, known to not compile with 5.6.4, which is currently in Ubuntu)". I don't know why some distro's still ship an ancient version of cryptopp, but I think @elbrutalo's advice is your best bet (or you could switch to a proper distro of course ;) )

@elbrutalo Ok, I'm glad it sort of works now. I can't think of any reason why messages would be doubled. We are again getting to a problem which I feel I would be able to solve and fix in a minute if I had the same backup-files you have and it was happening here. I need to think about it a little bit. In the meantime, I've added a new option to the program just for you. Would you mind running the new code three times on the sources and the merged file, and reporting the output?

./signalbackup-tools --elbrutalo /run/media/liveuser/KASTI32/source1.backup 559440538240148737383841861176
./signalbackup-tools --elbrutalo /run/media/liveuser/KASTI32/source2.backup 276748450922646926926144222545568
./signalbackup-tools --elbrutalo /run/media/liveuser/KASTI32/merged.backup 276748450922646926926144222545568

The fact that the merged file is not any larger than expected may suggest the doubled messages were already in the source file (just a possibility). Had you restored the 'fixed' (4,006,050,921 byte) backup, and did you see any doubled messages in there then? Is there anything special about 11.9.2019? Are the doubled messages all from source1, or all from source2? Are the correct (not doubled) message all from one of the sources?

I feel we are really close, but with all the previous problems I had a suspicion what was happening (and I was right most of the time I think :) ), but this time I'm not sure why this could be happening. I will have to think about it and I may have more questions later if you don't mind.

Esokrates commented 5 years ago

@bepaald Ok I resolved this issue with a newer version of libcrypto++-dev, however now I'm running in a different issue. Would be glad if you could share which library could be at fault this time: https://pastebin.com/JPGLEuh3

EDIT: Figured it out: g++ version 9 is needed, maybe you should add that in the README.

For people using Debian based systems that do not want to use a different distro in order to test this: You can grab the newer library from here: https://packages.debian.org/experimental/libcrypto++-dev https://packages.debian.org/experimental/libcrypto++8 g++-9 needs to be installed also and in the BUILDSCRIPT the COMPILER variable should be set to "/usr/bin/g++-9".

bepaald commented 5 years ago

@Esokrates Glad you figured it out. And thanks for pointing out version 9 of g++ is needed, I indeed only recently used std::string::starts_with() for the first time, but hadn't realized this was not available in previous versions. I've updated the README.

Now that you've successfully compiled, did you manage to actually run it? Did it work?

Thanks!

Esokrates commented 5 years ago

@bepaald Wish I could test it, it seems Signal blacklisted my phone number to register since I had a problem with signal finding the backup file on a custom rom device and used the number a few times in a row. Then I tried it on my main phone where Signal could detect the backup file, but it was too late since the registration service did no work. I have no idea for how long I'm blacklisted now :-(.

bepaald commented 5 years ago

@Esokrates Oops, that's annoying. Whenever something like this comes up on a Signal bug report, the devs usually respond with 'try again tomorrow'. I'd say, give it 24 hours just to be sure. I hope it will work then and that the merging was successful!

Esokrates commented 5 years ago

@bepaald Found an old Simcard and used that one instead. I am very happy to report you that the merging worked. Since my data is around 3GB however I can't really assure that everything is right but from the first look I couldn't find anything wrong.

My message partner a few years ago changed number a few times, I am wondering if I could merge different threads to one? Could you tell me how?

EDIT: Found a thread that crashes Signal on scrolling back, could you tell me how to debug that? EDIT2: Ok the scrollback-crash seems unreleated to the merging, happens with the original backup too, I'd curious if your tool could help debug and possibly fix this issue?

bepaald commented 5 years ago

@Esokrates Excellent! I'm very happy it worked!

My message partner a few years ago changed number a few times, I am wondering if I could merge different threads to one? Could you tell me how?

Yes I think this is possible. Every message in the database has a thread_id that identifies the thread it belongs to, it should be possible to change that and have all those messages appear in the same thread. This would however lead to a situation that never occurs in a natural signal datavase: a one-on-one thread with multiple recipients. It might happen that a future version of the app does not handle this well. I would therefor also change the recipient_id of the old threads to the new number. That should result in a completely normal database, but the exact history of number changes of your conversation partner would be lost.

If that's OK, I'll start on that second option tomorrow (no time today).

EDIT2: Ok the scrollback-crash seems unreleated to the merging, happens with the original backup too, I'd curious if your tool could help debug and possibly fix this issue?

Well I imagine some specific message coming into view causes the crash, which my application could certainly delete. You just need to find out which message causes the crash. I suppose you could do that by slowly scrolling one message at a time and remembering the last good message, then examining the database to check the message before that. Obviously if it turns out this is the case, I'll help you delete the offending entries from your backup file.

However, seeing as the crash occurs in the official signal app, with an original (untouched by my hacks) backup, I think you should probably open a new bug on signal issue tracker (with a debug log). The debug log is also the first place to start debugging, it will probably tell you the cause of the crash.

Esokrates commented 5 years ago

Yes I think this is possible. Every message in the database has a thread_id that identifies the thread it belongs to, it should be possible to change that and have all those messages appear in the same thread. This would however lead to a situation that never occurs in a natural signal datavase: a one-on-one thread with multiple recipients. It might happen that a future version of the app does not handle this well. I would therefor also change the recipient_id of the old threads to the new number. That should result in a completely normal database, but the exact history of number changes of your conversation partner would be lost.

Yeah, the numbers of the old threads should be altered to the number of the thread they are merged into. I'm not interested in keeping a number changes history, it is only confusing having conversations with the same person cluttered across different threads.

EDIT: Maybe a better idea for merging: allow to specify a mapping between numbers, so that it does not get confusing with groups: So for example I can specify to map a couple of number to a single new one and all threads with that number a merged and all groups are altered so that all the old numbers are replaced with a single new one.

elbrutalo commented 5 years ago

@bepaald thank you for your patience, I ran the three commands on the backup files. I hope this will be of help:

[liveuser@localhost-live signalbackup-tools]$ ./signalbackup-tools --elbrutalo /run/media/liveuser/KASTI32/source1.backup 559440538240148737383841861176
signalbackup-tools source version 20190928.145732
IV: (hex:) f8 66 b5 86 2a 72 f7 f3 e2 1f 74 a4 13 b5 3e b7 (size: 16)
SALT: (hex:) 58 de b1 82 9b 62 d5 64 9b 62 e2 92 50 82 bf 51 26 96 7c 96 64 c6 c5 8c ca 3e 56 52 1e 0e 3d ff (size: 32)
BACKUPKEY: (hex:) 2b cc 54 0b 90 d6 ed 38 ad b9 42 cf f6 1e b9 cd d1 c4 b9 08 60 b9 c2 13 d0 d9 38 9f 81 16 0f f0 (size: 32)
CIPHERKEY: (hex:) eb d2 56 48 60 1f 4d ac a1 44 d1 f6 f8 37 d4 ac cb eb 10 fc 0b be 45 9a 13 74 9a 27 24 bc 41 85 (size: 32)
MACKEY: (hex:) 0b 58 b5 e4 b0 4b a4 60 a8 da 96 73 52 7e 7e 0d a1 dc 04 50 2b 6a 16 d8 9f 6f bd d4 91 f9 e9 5e (size: 32)
COUNTER: 4167480710
Reading backup file...
FRAME 67270 (100.0%)... Read entire backup file...
done!
Executing query: SELECT COUNT(*) AS num_sms, MIN(date), MAX(date) FROM sms
-------------------------------------------
| num_sms | MIN(date)     | MAX(date)     |
-------------------------------------------
| 59436   | 1292663246000 | 1564476268493 |
-------------------------------------------
Executing query: SELECT COUNT(*) AS doubles FROM (SELECT DISTINCT t1.* FROM sms AS t1 INNER JOIN sms AS t2 ON t1.date = t2.date AND t1.body = t2.body AND t1.date_sent = t2.date_sent AND t1._id <> t2._id)
-----------
| doubles |
-----------
| 8       |
-----------
Executing query: SELECT COUNT(*) AS num_mms, MIN(date), MAX(date) FROM mms
-------------------------------------------
| num_mms | MIN(date)     | MAX(date)     |
-------------------------------------------
| 2410    | 1500758375621 | 1564430897457 |
-------------------------------------------
Executing query: SELECT COUNT(*) AS doubles FROM (SELECT DISTINCT t1.* FROM mms AS t1 INNER JOIN mms AS t2 ON t1.date = t2.date AND t1.body = t2.body AND t1.date_received = t2.date_received AND t1._id <> t2._id) AS doubles
-----------
| doubles |
-----------
| 0       |
-----------
Executing query: SELECT COUNT(*) AS num_thread FROM thread
--------------
| num_thread |
--------------
| 512        |
--------------

Second one:

[liveuser@localhost-live signalbackup-tools]$ ./signalbackup-tools --elbrutalo /run/media/liveuser/KASTI32/source2.backup 276748450922646926144222545568
signalbackup-tools source version 20190928.145732
IV: (hex:) bb 07 51 6b 2b f7 9d 2e 1d 67 79 a3 d8 c1 50 06 (size: 16)
SALT: (hex:) aa ae 9f d1 ef 6a 12 11 c5 0d 00 ab 24 ca 58 d8 c5 d0 d0 50 1a bd 30 9f 74 5f e4 7f c5 b5 af c2 (size: 32)
BACKUPKEY: (hex:) 04 fd 48 c5 9f aa 82 13 3d ee 00 dd ec be 69 de 5e e6 0f 4f eb 89 69 e2 1d 71 7f cf 3e 4c 3d 82 (size: 32)
CIPHERKEY: (hex:) 0f 59 f7 e2 a2 90 56 7a 3b c7 48 15 2b df fe d8 0f 59 7f 2c 71 9e 6c 80 eb 94 51 46 dc e3 9e ab (size: 32)
MACKEY: (hex:) 26 98 ed a7 6b dd 36 69 5f 19 a2 9c 1e 3b ed 51 39 23 09 37 7f e7 ba 5c 0b 7d 91 ae fb 53 ac f1 (size: 32)
COUNTER: 3137818987
Reading backup file...
FRAME 53295 (100.0%)... Read entire backup file...
done!
Executing query: SELECT COUNT(*) AS num_sms, MIN(date), MAX(date) FROM sms
-------------------------------------------
| num_sms | MIN(date)     | MAX(date)     |
-------------------------------------------
| 48688   | 1292663246000 | 1569567666589 |
-------------------------------------------
Executing query: SELECT COUNT(*) AS doubles FROM (SELECT DISTINCT t1.* FROM sms AS t1 INNER JOIN sms AS t2 ON t1.date = t2.date AND t1.body = t2.body AND t1.date_sent = t2.date_sent AND t1._id <> t2._id)
-----------
| doubles |
-----------
| 6       |
-----------
Executing query: SELECT COUNT(*) AS num_mms, MIN(date), MAX(date) FROM mms
-------------------------------------------
| num_mms | MIN(date)     | MAX(date)     |
-------------------------------------------
| 1239    | 1500758375621 | 1569567333198 |
-------------------------------------------
Executing query: SELECT COUNT(*) AS doubles FROM (SELECT DISTINCT t1.* FROM mms AS t1 INNER JOIN mms AS t2 ON t1.date = t2.date AND t1.body = t2.body AND t1.date_received = t2.date_received AND t1._id <> t2._id) AS doubles
-----------
| doubles |
-----------
| 0       |
-----------
Executing query: SELECT COUNT(*) AS num_thread FROM thread
--------------
| num_thread |
--------------
| 511        |
--------------

Third one:

[liveuser@localhost-live signalbackup-tools]$ ./signalbackup-tools --elbrutalo /run/media/liveuser/KASTI32/merged.backup 276748450922646926144222545568
signalbackup-tools source version 20190928.145732
IV: (hex:) bb 07 51 6b 2b f7 9d 2e 1d 67 79 a3 d8 c1 50 06 (size: 16)
SALT: (hex:) aa ae 9f d1 ef 6a 12 11 c5 0d 00 ab 24 ca 58 d8 c5 d0 d0 50 1a bd 30 9f 74 5f e4 7f c5 b5 af c2 (size: 32)
BACKUPKEY: (hex:) 04 fd 48 c5 9f aa 82 13 3d ee 00 dd ec be 69 de 5e e6 0f 4f eb 89 69 e2 1d 71 7f cf 3e 4c 3d 82 (size: 32)
CIPHERKEY: (hex:) 0f 59 f7 e2 a2 90 56 7a 3b c7 48 15 2b df fe d8 0f 59 7f 2c 71 9e 6c 80 eb 94 51 46 dc e3 9e ab (size: 32)
MACKEY: (hex:) 26 98 ed a7 6b dd 36 69 5f 19 a2 9c 1e 3b ed 51 39 23 09 37 7f e7 ba 5c 0b 7d 91 ae fb 53 ac f1 (size: 32)
COUNTER: 3137818987
Reading backup file...
FRAME 119998 (100.0%)... Read entire backup file...
done!
Executing query: SELECT COUNT(*) AS num_sms, MIN(date), MAX(date) FROM sms
-------------------------------------------
| num_sms | MIN(date)     | MAX(date)     |
-------------------------------------------
| 108123  | 1292663246000 | 1569567666589 |
-------------------------------------------
Executing query: SELECT COUNT(*) AS doubles FROM (SELECT DISTINCT t1.* FROM sms AS t1 INNER JOIN sms AS t2 ON t1.date = t2.date AND t1.body = t2.body AND t1.date_sent = t2.date_sent AND t1._id <> t2._id)
-----------
| doubles |
-----------
| 95615   |
-----------
Executing query: SELECT COUNT(*) AS num_mms, MIN(date), MAX(date) FROM mms
-------------------------------------------
| num_mms | MIN(date)     | MAX(date)     |
-------------------------------------------
| 3648    | 1500758375621 | 1569567333198 |
-------------------------------------------
Executing query: SELECT COUNT(*) AS doubles FROM (SELECT DISTINCT t1.* FROM mms AS t1 INNER JOIN mms AS t2 ON t1.date = t2.date AND t1.body = t2.body AND t1.date_received = t2.date_received AND t1._id <> t2._id) AS doubles
-----------
| doubles |
-----------
| 1486    |
-----------
Executing query: SELECT COUNT(*) AS num_thread FROM thread
--------------
| num_thread |
--------------
| 511        |
--------------
bepaald commented 5 years ago

@Esokrates Ok, I made a first attempt at merging recipients. Run with: signalbackup-tools --mergerecipients recipient_id1,recipient_id2,...,recipient_idN --output [OUTPUT] --opassword [NEWPASSWORD] [INPUT] [PASSWORD]. The recipient_id can be found with --listthreads, and is usually just the phone number (often preceded by a plus-sign), eg '+601234567890'. The current number, into which all other numbers are to be merged, is listed last on the command line.

I think this should take care of all your problems, but tell me if I'm missing something. Also, I wrote this code just this afternoon, it was only very briefly tested, so again, it is highly experimental.

@elbrutalo Ok, thanks! AS far as I can tell the merging looks completely fine, I think it worked just as intended. I just hadn't realized there was an overlap in messages in your source databases. From the output you posted, I can tell that source1 contains message from 18.12.2010 to 30.07.2019, source 2 has messages from 18.12.2010 to 27.09.2019. So, source2 starts at the same time, it does contains fewer messages. I assume there is a gap in the messages in source2?

I can see two possible solutions, depending on the exact situation. Obviously, messages from after 27.09.2019 are only in source2. Also, since source1 contains more messages than source2, there are definitely messages from before 30.07.2019 that are only in source1. The big question is, are there messages in source2 from before 30.07.2019 that are not present in source1? I would guess not (I wouldn't know how that could happen, but I'm not 100% sure how these two backups came into existence anyway).

If there are no messages from before 30.07.2019 that are only in source2, the best solution is to crop source2 by date to after this moment. Luckily, I had already written the code to do that and have now enabled that. Run with

./signalbackup-tools --croptodates 0,1564476268494 /run/media/liveuser/KASTI32/source2.backup 276748450922646926144222545568 --output [NEWOUTPUT] --opassword [NEWPASSWORD]

Then, proceed to merge source1 and this newly cropped source2 exactly as before (obviously, you could create a new backup to replace source2 first, to get the messages from the last few days as well).

If there are messages in source2 before 30.07.2017 that are not in source1 (and the other way around), then it becomes slightly less easy. Then we should probably just try to delete the doubled messages from the database after merging, but identifying double messages might not be 100% foolproof. Please tell me if this is the case we are dealing with, because I should write some more code if that's the case.

Esokrates commented 5 years ago

@bepaald Thanks, great, though it might take some time until I can test it, I've got not enough time for testing at the moment. Regarding the group situation: Assume a 1:1 conversation with "oldnumber" which took place in a group. Now the messages from "oldnumber" are altered to originate from "newnumber", however "newnumber" is not even a groupmember. Imo you should replace the "oldnumber" member with the "newnumber" member, since it is safe to assume, that "oldnumber" is not in use anymore, so even if the group where still to exist, "oldnumber" wouldn't be an active member anymore.

bepaald commented 5 years ago

@Esokrates OK, I hadn't thought of that particular use case. I still think that in most normal circumstances one would just add "newnumber" to the group afterwards, and that the change is just a tiny visual one. Also, having messages in a group chat from a contact who isn't a member is a fairly normal thing (happens whenever someone leaves a group). Lastly, and most importantly, I don't know the exact effects this has if one (accidentally?) uses the edited group. A message is then sent to "newnumber" to be placed in a group he/she isn't a member of. Maybe nothing happens, maybe the message is placed in a new groupchat, maybe Signal crashes, maybe the phone explodes, who knows. And if it is not the intention to use the thread, why worry about the odd member list?

Anyway, having said that, I'll add the option once I have time. It will be a seperate option to edit member list (so a second step after --mergerecipients). It will at first just change the actual current memberlist, that should be fairly easy to do. I'll look into also changing the messages 'member has updated the group, members are now member, oldnumber... etc' that display inside the conversation, but that might take a while. They are annoyingly encoded as a base64 encoded binary blob representing a Google protocol buffer that holds the data. I know I can (and do) already decode these messages, but I don't think I wrote any methods to reencode them. So those will take more time.

bepaald commented 5 years ago

@Esokrates Probably too late, but I got around to implementing editing the group member-list (including the status update messages!). It doesn't require a second step, just add the --editgroupmembers switch to the mergerecipients command (so: signalbackup-tools --mergerecipients recipient_id1,recipient_id2,...,recipient_idN --editgroupmembers --output [OUTPUT] --opassword [NEWPASSWORD] [INPUT] [PASSWORD]).

However, the 4.48 update has rolled out by now, so unless you for some reason haven't updated in the last 1 or 2 weeks, the code will not work anymore. I will try to get my code compatible with the changes in the database format, but it will take a bit of time (which I don't always have) and the 4.49 update (in beta now) continues the changes in the database (the changes in 4.48 are somewhat incomplete) and will break the program again, so I'll try to prepare for that but probably won't release the new code until 4.49 is also out, unless I can get it to be compatible with all versions simultaneously.

Esokrates commented 5 years ago

@bepaald Thanks very much. For me it's not too late because I did not upgrade. I do not have time to test now however. I will test it when I have more time.

So if you plan to make changes for a new version, please create a different branch :-).

elbrutalo commented 5 years ago

Dear bepaald,

I had only now time to start a new attempt again.

Merging the two old backups (source1.backup + source2.backup) seems to have worked with the current version of signalbackup-tools as well.

Removing messages from source2.backup from certain time periods didn't work anymore, probably because of the signalbackup tools that were updated in the meantime.

I also suspect that cutting threads by a certain time window will not remove all or too many duplicates because I don't think the overlap in the threads can be fixed at a certain time but rather by the number of messages in the threads.

Source2.backup was originally created by accidentally activating the option "Cut threads to the last 500 messages" in the signal, i.e. older messages were deleted and later new messages were added for several weeks. So I can't "cut off" a time period from source1.backup before merging the backup with the new backup source2.backup.

Is it possible to have the signalbackup tools automatically search for duplicate messages (incoming and outgoing) and clean them up?

I would then like to proceed as follows:

I already tried to merge the merged.backup with the new backup of today. But that didn't work and also the listthreads caused a lot of mysql errors.

Does this have to do with the conversion to the new signal version? Can I still use the tools to merge my two old backups (signal1.backup + signal2.backup), then somehow remove the duplicates (if that is not possible I keep the duplicates) and merge the merged backup with my current backup?

Then I would have duplicates but still my complete history until today.

Many thanks already for the help elb

bepaald commented 5 years ago

@elbrutalo No problem, I have busy periods myself.

Ok, now I understand the duplicate messages, I did not even know this option existed in Signal! But indeed, cropping to certain dates will probably not work, so removing duplicates seems to be the way to go.

  • merging signal1.backup + signal2.backup into merged.backup
  • remove duplicates from merged.backup
  • merging of merged.backup and the new backup (news since October until today) signal3.backup to a final merged-aktuell.backup
  • Reimport into signal

Yes this seems the way to do it. Complicated, but I think it should work. The only thing missing for this is the option to find and remove duplicates. I'll try to get started on that functionality this weekend.

I already tried to merge the merged.backup with the new backup of today. But that didn't work and also the listthreads caused a lot of mysql errors.

Does this have to do with the conversion to the new signal version?

Yes absolutely. But, I had been working the past few weeks on updating the code to deal with the new database format and just pushed the changes earlier today. So at least merging, listthreads and cropping by date or thread should all be working again! (@Esokrates all changes should be backwards compatible, so if you still haven't updated, it should still work, I haven't updated the mergerecipients option yet).

One important thing: Merging databases only works if either both databases are old or both are new, NOT when one is old and the other is new. 'merged.backup' will be an old style database (the same as either source1 or source2, depending on which is the input and which is the '--source' on the command line), but the new backup 'signal3.backup' will be the new format. So in order to merge those two databases, you would first need to update merged.backup to the new format by importing it into signal and re-exporting it to get 'merged_new.backup'. I hope signal allows importing the old database to upgrade it, otherwise you even need to first downgrade Signal, import merged.backup, then update Signal to the current version, and then export the backup. I just tested importing an older backup (database version 25) into current Signal and it seems to work fine.

bepaald commented 4 years ago

@elbrutalo Ok, I think you are good to go. So

  1. Merge signal1.backup and signal2.backup -> merged.backup, same as before, or just skip this if you still have the merged.backup
  2. Remove duplicates from merged.backup -> merged2.backup: ./signalbackup-tools --removedoubles merged.backup [password] --output merged2.backup --opassword [password]
  3. Export your current backup from Signal (signal3.backup!
  4. Import merged2.backup into Signal, have the app create a new backup -> merged_new.backup
  5. Merge merged_new and signal3.backup -> final.backup
  6. Import final.backup into Signal

done!

So, step 1 is only needed if you've not kept the merged.backup from before. Also, from the output you posted here, it seems like there were somehow already a few doubled messages in your signal1.backup and signal2.backup (6 and 8, or actually 3 and 4, as I'm counting both of each double). These will probably also be removed in step 2, but they may be false positives (I don't know how doubles would get in your database, these are messages with the exact same message body, sent to/from the same person in the same thread at the exact same time (down to the milliseconds)). I have updated the --elbrutalo option to actually show doubled messages (not just the count), I don't know if the data is readable for you, but if you want to be sure these messages are doubled you could check that first.

EDIT I pushed a small update yesterday: you do not have to specify a full list of thread ids to --importthreads anymore, if you want all threads imported you can just use --importthreads ALL. That should make it a tiny bit easier, and removes the need to call 'listthreads' before importing.

Esokrates commented 4 years ago

Well I imagine some specific message coming into view causes the crash, which my application could certainly delete. You just need to find out which message causes the crash. I suppose you could do that by slowly scrolling one message at a time and remembering the last good message, then examining the database to check the message before that. Obviously if it turns out this is the case, I'll help you delete the offending entries from your backup file.

However, seeing as the crash occurs in the official signal app, with an original (untouched by my hacks) backup, I think you should probably open a new bug on signal issue tracker (with a debug log). The debug log is also the first place to start debugging, it will probably tell you the cause of the crash.

I reported the bug now: https://github.com/signalapp/Signal-Android/issues/9266 Could you help me fix / delete the offending message?