bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
769 stars 36 forks source link

Unable to import processed backup file #17

Closed hhenkel closed 1 year ago

hhenkel commented 3 years ago

Hi,

I tried your tool today to see if I could fix a somewhat broken signal backup file. The original file is shown with a size of 3.2 G, the resulting file is 2.1 G big, but I'm still unable to import it. I'll attach the output of the run, it seems like "105/2018 entries...Warning: attachment data not found " problems were not fixed, as they are still show up if I rerun your tool against the fixed backup file.

COUNTER: 3341106540
Reading backup file...
FRAME 26197 (062.6%)...
WARNING: Bad MAC in attachmentdata: theirMac: (hex:) 9b ce 14 32 ef 51 6c c9 aa 84
                                      ourMac: (hex:) bd a9 d1 d0 8b 14 96 1f 38 09 bf 6e 4b 06 6d 2f ec 04 57 8d d6 b3 8d bb 27 40 ef 9e 1b b8 e9 84

WARNING: Bad MAC in frame, trying to print frame info:
Frame number: 26198
        Type: ATTACHMENT
         - row id          : 2037 (8 bytes)
         - attachment id   : 1582746100843 (8 bytes)
         - length          : 2295648 (8 bytes)
         - attachment      : (hex:) ff d8 ff e0 00 10 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 ff db 00 43 00 ... (2295648 bytes total)
Frame is attachment, it belongs to entry in the 'part' table of the database:
 - _id : 2037
 - mid : 2826
 - seq : 0
 - ct : image/jpeg
 - name : (NULL)
 - chset : (NULL)
 - cd : O2WuMgC5+YpercFkV3M1gEXg2XzSIYr6Y0xWH892WMUXIPp7Bi3ALbXClhsAb6bkVrs4LpIE5vN2jZ65oZp1Eg==
 - fn : (NULL)
 - cid : (NULL)
 - cl : 292886834820839875
 - ctt_s : (NULL)
 - ctt_t : (NULL)
 - encrypted : (NULL)
 - pending_push : 0
 - _data : /data/user/0/org.thoughtcrime.securesms/app_parts/part7364302166359440396.mms
 - data_size : 2295648
 - file_name : (NULL)
 - thumbnail : /data/user/0/org.thoughtcrime.securesms/app_parts/part4545823281927312143.mms
 - aspect_ratio : 1
 - unique_id : 1582746100843
 - digest : (hex:) 4b 1f eb fc 85 21 4b f8 83 c3 70 6c df a4 c6 01 5b f1 5c 57 c3 33 56 25 3c 9b 5e 27 3c 0b 49 03
 - fast_preflight_id : 7620226491031352570
 - voice_note : 0
 - data_random : (hex:) 77 1e 69 fa 01 c1 d6 27 0a 47 b7 08 bc f5 0f 19 f8 84 f2 43 33 3c cd 86 42 fc 79 69 f6 37 7a 58
 - thumbnail_random : (hex:) c6 d3 ee 58 64 d4 34 8a 8b 28 00 8b ae 38 33 64 7a af ab e2 77 fd 2f ce cb 3d 86 ec cb 39 1a 0c
 - width : 4032
 - height : 3024
 - quote : 0
 - caption : (NULL)
 - sticker_pack_id : (NULL)
 - sticker_pack_key : (NULL)
 - sticker_id : -1
 - data_hash : xhwj9ntWiYTHa1kPydkXxIgdHlfYtIr/P4/AUHpA1v8=
 - blur_hash : LLI#My-pInR*%$EKj]RP~psDRPoe
 - transform_properties : {"skipTransform":true}
 - transfer_file : (NULL)
 - display_order : 6
 - upload_timestamp : 0
 - cdn_number : 0
 - borderless : 0

Which belongs to entry in 'mms' table:
 - _id : 2826
 - thread_id : 20
 - date : 2020-02-26 19:42:00 +0000 (1582746120192)
 - date_received : 2020-02-26 19:42:00 +0000 (1582746120197)
 - msg_box : 10485783
 - read : 1
 - m_id : (NULL)
 - sub : (NULL)
 - sub_cs : (NULL)
 - body :
 - part_count : 0
 - ct_t : (NULL)
 - ct_l : (NULL)
 - address : 240
 - address_device_id : (NULL)
 - exp : (NULL)
 - m_cls : (NULL)
 - m_type : 128
 - v : (NULL)
 - m_size : (NULL)
 - pri : (NULL)
 - rr : (NULL)
 - rpt_a : (NULL)
 - resp_st : (NULL)
 - st : (NULL)
 - tr_id : (NULL)
 - retr_st : (NULL)
 - retr_txt : (NULL)
 - retr_txt_cs : (NULL)
 - read_status : (NULL)
 - ct_cls : (NULL)
 - resp_txt : (NULL)
 - d_tm : (NULL)
 - delivery_receipt_count : 2
 - mismatched_identities : (NULL)
 - network_failures : (NULL)
 - d_rpt : (NULL)
 - subscription_id : -1
 - expires_in : 0
 - expire_started : 0
 - notified : 0
 - read_receipt_count : 2
 - quote_id : 0
 - quote_author : (NULL)
 - quote_body : (NULL)
 - quote_attachment : -1
 - shared_contacts : (NULL)
 - quote_missing : 0
 - unidentified : 0
 - previews : (NULL)
 - reveal_duration : 0
 - reveal_start_time : 0
 - reactions : (NULL)
 - reactions_unread : 0
 - reactions_last_seen : -1
 - date_server : -1
 - remote_deleted : 0
 - quote_mentions : (NULL)
 - mentions_self : 0
Trying to dump decoded attachment to file 'attachment_2826.bin'
FRAME 26198 (062.7%)... Failed to read next frame (334893159 bytes at filepos 2148656909)
Starting bruteforcing offset to next valid frame... starting after: 2148656909
Checking offset 70 bytes
GOT GOOD MAC AT OFFSET 76 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 7 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 350 bytes
GOT GOOD MAC AT OFFSET 354 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 35 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 670 bytes
GOT GOOD MAC AT OFFSET 679 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 67 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 950 bytes
GOT GOOD MAC AT OFFSET 955 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 95 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 1260 bytes
GOT GOOD MAC AT OFFSET 1260 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 126 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 1530 bytes
GOT GOOD MAC AT OFFSET 1533 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 153 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 1810 bytes
GOT GOOD MAC AT OFFSET 1814 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 110 frames... YEAH! :(
Good frame: 26798 (AvatarFrame)

Got frame, breaking
FRAME 26798 (062.7%)... Failed to get valid frame from decoded data...
Data was verified ok, but does not represent a valid frame... Don't know what happened, but it's bad... Aborting :(
Attachment data with BAD MAC was encountered:
Short info on message to which attachment with bad mac belongs (1/1):
Date          : 2020-02-26 19:42:00 +0000 (1582746120192)
Date received : 2020-02-26 19:42:00 +0000 (1582746120197)
Sent to       :
Message body  :

done!
WARNING EndFrame was not read: backup is probably incomplete

Exporting backup to 'signal-2020-09-20-22-55-44-fixed.backup'
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing SqlStatementFrame(s)...
  Dealing with table 'sms'... 17807/17807 entries...done
  Dealing with table 'mms'... 4303/4303 entries...done
  Dealing with table 'part'... 105/2018 entries...Warning: attachment data not found (rowid: 108, uniqueid: 1516771290884)
  Dealing with table 'part'... 106/2018 entries...Warning: attachment data not found (rowid: 109, uniqueid: 1516771292418)
  Dealing with table 'part'... 107/2018 entries...Warning: attachment data not found (rowid: 110, uniqueid: 1516771293336)
  Dealing with table 'part'... 108/2018 entries...Warning: attachment data not found (rowid: 111, uniqueid: 1516771295123)
  Dealing with table 'part'... 201/2018 entries...Warning: attachment data not found (rowid: 205, uniqueid: 1522489127686)
  Dealing with table 'part'... 350/2018 entries...Warning: attachment data not found (rowid: 366, uniqueid: 1530908240957)
  Dealing with table 'part'... 393/2018 entries...Warning: attachment data not found (rowid: 409, uniqueid: 1532770413682)
  Dealing with table 'part'... 482/2018 entries...Warning: attachment data not found (rowid: 500, uniqueid: 1537166862745)
  Dealing with table 'part'... 538/2018 entries...Warning: attachment data not found (rowid: 556, uniqueid: 1538849087089)
  Dealing with table 'part'... 713/2018 entries...Warning: attachment data not found (rowid: 731, uniqueid: 1547229649473)
  Dealing with table 'part'... 1339/2018 entries...Warning: attachment data not found (rowid: 1357, uniqueid: 1564526991077)
  Dealing with table 'part'... 1340/2018 entries...Warning: attachment data not found (rowid: 1358, uniqueid: 1564526991082)
  Dealing with table 'part'... 1341/2018 entries...Warning: attachment data not found (rowid: 1359, uniqueid: 1564526993343)
  Dealing with table 'part'... 1506/2018 entries...Warning: attachment data not found (rowid: 1525, uniqueid: 1568657470198)
  Dealing with table 'part'... 1664/2018 entries...Warning: attachment data not found (rowid: 1683, uniqueid: 1575396908093)
  Dealing with table 'part'... 1665/2018 entries...Warning: attachment data not found (rowid: 1684, uniqueid: 1575396908219)
  Dealing with table 'part'... 1666/2018 entries...Warning: attachment data not found (rowid: 1685, uniqueid: 1575397001172)
  Dealing with table 'part'... 1855/2018 entries...Warning: attachment data not found (rowid: 1874, uniqueid: 1579376535561)
  Dealing with table 'part'... 2018/2018 entries...done
  Dealing with table 'thread'... 0/0 entries...
  Dealing with table 'identities'... 0/0 entries...
  Dealing with table 'drafts'... 0/0 entries...
  Dealing with table 'push'... 0/0 entries...
  Dealing with table 'groups'... 0/0 entries...
  Dealing with table 'group_receipts'... 0/0 entries...
  Dealing with table 'job_spec'... 0/0 entries...
  Dealing with table 'constraint_spec'... 0/0 entries...
  Dealing with table 'dependency_spec'... 0/0 entries...
  Dealing with table 'sticker'... 0/0 entries...
  Dealing with table 'recipient'... 0/0 entries...
  Dealing with table 'storage_key'... 0/0 entries...
  Dealing with table 'key_value'... 0/0 entries...
  Dealing with table 'megaphone'... 0/0 entries...
  Dealing with table 'remapped_recipients'... 0/0 entries...
  Dealing with table 'remapped_threads'... 0/0 entries...
  Dealing with table 'mention'... 0/0 entries...
Writing SharedPrefFrame(s)...
Writing Avatars...
Writing EndFrame...
Done!
bepaald commented 3 years ago

Hi! Sorry you are having problems, at first glance things do not look good. I will try to help you, but I might have about a million question. Here are the first couple:

Do you have any idea what happened to the backup file? Do you still have the original Signal installation that created this backup (can you repeat the process)? Do you still have multiple copies of this backup (for example on the phone, on an sd card or usb stick) and can you verify that they are all the same (I usually take an md5sum of backup files before and after transfer)? It might seem stupid, but the most common cause of corrupted backups these days is just some faulty storage or network problems while transferring it from the old phone to the new one.

The attachment data not found messages are somewhat expected, they happen in a lot of backups. I don't remember exactly what causes it, I think it may be that if you delete (message with) an attachment, the entry in the 'part' table is not actually removed, but I'm not sure. If I have some time I will try to figure it out, but I wouldn't worry about that part too much.

Data was verified ok, but does not represent a valid frame... Don't know what happened, but it's bad... Aborting :(

As far as I know, you are the first and only person to ever reach this part of the code, I thought it was a statistical impossibility to get at this point, but either that's not the case, or there is a bug in my code (which is very possible).

I have just uploaded a new version which adds a (work-in-progress) 'verbose' option. Also, instead of just bailing out when encountering a verified but invalid frame, it tries to just skip it and continue. Could you please rerun the tool with --verbose added? No need to pass an output file, I do not expect this to fix the problem, but the output might help me better understand what is happening. There will be a lot of output, so maybe attach it instead of pasting.

Let me know if your are using the windows version and you need me to compile it for you, right now I just updated the source code.

I am often quite busy, but I'll try to put in some time this evening and tomorrow (I have a day off).

hhenkel commented 3 years ago

Do you have any idea what happened to the backup file?

It's from a phone that initially had not enough space to perform a backup but at some point it seemed that enough space was available and the backup finished correctly. I tested the file initially with signal-back check and it responded with Backup looks okay from here.

Do you still have the original Signal installation that created this backup (can you repeat the process)?

No, I can't repeat the process as the phone is back to factory settings.

Do you still have multiple copies of this backup (for example on the phone, on an sd card or usb stick) and can you verify that they are all the same (I usually take an md5sum of backup files before and after transfer)?

I got multiple copies of the file and did verify via sha256sum that all of them got the same checksum.

As far as I know, you are the first and only person to ever reach this part of the code, I thought it was a statistical impossibility to get at this point, but either that's not the case, or there is a bug in my code (which is very possible).

Not sure if I should feel honored... ;)

I have just uploaded a new version which adds a (work-in-progress) 'verbose' option. Also, instead of just bailing out when encountering a verified but invalid frame, it tries to just skip it and continue. Could you please rerun the tool with --verbose added?

Sure, will do that in the next couple of minutes and will provide the output.

Let me know if your are using the windows version and you need me to compile it for you, right now I just updated the source code.

No need to compile for me, I'm using the container via buildah / podman.

I am often quite busy, but I'll try to put in some time this evening and tomorrow (I have a day off).

Brilliant, if you're interested in a faster feedback loop let me know what might work for you (IRC, discord etc.)

hhenkel commented 3 years ago

@bepaald I got the resulting output file - it is 12MB big. Is it save to attach it here or am I giving parts of my data away?

bepaald commented 3 years ago

I don't think there is anything in there, but just to be safe maybe you can email it to me. Thanks!

bepaald commented 3 years ago

Ok, thanks. I've really been thinking hard on this, it's a difficult problem. (The following may not make much sense to you, but I'm writing it partly to remind myself what I think the situation is)

The first thing that goes wrong is that (after the first corrupted data appears), the frame-sync is lost and it tries to find a valid frame again. The one it thinks is good (Good frame: 26798 (AvatarFrame)), is actually not a valid frame (I think, from looking at the verbose output).

After this, it looks to me, all the next frames are decrypted with incorrect key material leading to random data (and invalid frames), which suggests to me the counter is not correct. Probably either an attachment was tried to be read when there wasn't one, or one that should have been read was skipped (the counter is increased when an attachment follows a backupframe).

Now, I've tried to deal with the first problem, frame validation is a lot stricter now, so that might already make a difference. The second problem I just can't seem to wrap my head around. At the moment I have not been able to figure out by looking at the code logic how this can happen. I've tried reproducing this in many ways now and while sometimes I do indeed get a few invalid frames, the program actually always corrects itself and successfully decrypts most of the backup file.

Could you please try again? Same as yesterday? I'm not sure if the changes I've made so far have greatly increased the chances of success or actually decreased them, but in either case, the output might give me some more insight.

Thanks!

hhenkel commented 3 years ago

Thanks for the update, I'm currently building the newest version and will send you the log file again.

hhenkel commented 3 years ago

Resulting log file is 162 MB big...compressed it comes down to 16 MB - you still want it via mail?

bepaald commented 3 years ago

Ouch, that doesn't sound promising... But send it anyway, I think I can grep though it allright...

bepaald commented 3 years ago

Well, I'm afraid I'm going to need to give this problem a bit more time in my head. Looking at what's happening I'm really starting to think there must be some flaw in the programs logic, but I've read over it many times today and I can't find it and can't reproduce the things I'm seeing in your log by deliberately breaking my own backups. At the same time I can also not prove with certainty in any way that there is a bug after the program incorrectly finds a 'good frame' after having lost sync. This incorrectly validating a bad frame can theoretically always happen since the program at that point is dealing with essentially random data.

I've made frame validation even stricter now. I don't think it will fix the problems you are having this time, but if you are not sick of it yet, I wouldn't mind getting the new output.

I'll have a little less time the coming days, but I'll surely be thinking about this problem often and will try to spend a bit of time on it. Hopefully next weekend (if not before) I'll have an idea and more time to implement something, though you might have to run the program another few times and pass along the results. Thank you for your patience.

hhenkel commented 3 years ago

So I already rebuild your tool yesterday night and did process the backup file. A fixed version was build and I was able to "import" it - or at least signal thought it was okay. I ended up with no messages in the app. I was able to catch some of the output via adb logcat which you'll find in the attached file.

I'm also able to provide you the logfile from the signalback-tool verbose output via mail if needed, it 51 MB, resulting in 1.3 MB in compressed form.

hhenkel commented 3 years ago

adb_logcat.txt

bepaald commented 3 years ago

Thanks. I think I can pretty much guess what happened this time, but if it's not too much trouble I wouldn't mind looking at the log.

From what is happening I think I see only two possibilities: a bug in my program, or the signal app has written bad data to the backup file. I do think I have a way to check which, but I won't be able to implement a test until this weekend at the earliest, and then you would have to run it.

You should probably prepare for the possibility that the last ~1/3 of the file is simply random data where no information exists to be recovered. From the log posted in your first message (and I'm guessing that the last log from yesterday ends the same way), at least all your (text) message content is still there and probably about 2/3 of your attachments. The problem is, the messages belong in threads, but the thread database is at the end of the backup file, which is missing:

Dealing with table 'thread'... 0/0 entries...

And this subsequently makes it appear there are no messages in the app after restoring. Now, my program has in the past been able to create a working thread database for people who had incomplete (truncated) backups files, using the undocumented --generatefromtruncated flag. However, I think a fully automated procedure is no longer possible in the current database format. I would have to check this, and then think about what minimal manual actions are necessary to generate a more functional backup from your data.

You wouldn't happen to have some other, older, working backup? It would probably be possible to transfer the thread and recipient tables from that, that would simplify matters if it gets to that.

hhenkel commented 3 years ago

Thanks. I think I can pretty much guess what happened this time, but if it's not too much trouble I wouldn't mind looking at the log.

You should have the log file in the mailbox

From what is happening I think I see only two possibilities: a bug in my program, or the signal app has written bad data to the backup file. I do think I have a way to check which, but I won't be able to implement a test until this weekend at the earliest, and then you would have to run it.

No worries - I'm able to retest it on the weekend. But I guess it is not really possible to use signal in the meantime, right? Or is there a way to merge a newer and an older backup?

You should probably prepare for the possibility that the last ~1/3 of the file is simply random data where no information exists to be recovered. From the log posted in your first message (and I'm guessing that the last log from yesterday ends the same way), at least all your (text) message content is still there and probably about 2/3 of your attachments. The problem is, the messages belong in threads, but the thread database is at the end of the backup file, which is missing:

Dealing with table 'thread'... 0/0 entries...

It's not such a big issue if the attachments are missing. Just out of curiosity - is it not possible to find the attachments in the data through magic tests?

And this subsequently makes it appear there are no messages in the app after restoring. Now, my program has in the past been able to create a working thread database for people who had incomplete (truncated) backups files, using the undocumented --generatefromtruncated flag. However, I think a fully automated procedure is no longer possible in the current database format. I would have to check this, and then think about what minimal manual actions are necessary to generate a more functional backup from your data.

You wouldn't happen to have some other, older, working backup? It would probably be possible to transfer the thread and recipient tables from that, that would simplify matters if it gets to that.

Nope - there is a desktop app which was hooked up with that account till the end (but not from the start)

bepaald commented 3 years ago

No worries - I'm able to retest it on the weekend. But I guess it is not really possible to use signal in the meantime, right? Or is there a way to merge a newer and an older backup?

Yes there is: https://github.com/bepaald/signalbackup-tools#merge! Note about the note: that is slightly outdated, I've had multiple reports of success (both here [1], [2] and via email), just haven't updated the readme yet.

It's not such a big issue if the attachments are missing. Just out of curiosity - is it not possible to find the attachments in the data through magic tests?

Well, no not really, but I think it is difficult to explain. File magic would only work on decrypted data, but it is impossible to decrypt the data once framesync is lost (as happens in your file after that first 'bad mac' warning). This is because each frame in the backup file is encrypted with different parameters (the framecount being the important one here), and after losing framesync, even if a correct frame boundary is found, it is still not possible to continue decrypting without also determining how many bad frames were skipped (because otherwise the framecount would be incorrect). So, I guess in summary, you can't do file magic without decrypting, but if you can decrypt you wouldn't need file magic.

(Warning, I think I used the following paragraph to sort my own thoughts again, it may be hard to follow) I think I am pretty sure there is no data left to recover in the file. My program does still at some point start to find chunks of bytes which it thinks could be a frame, but after decoding don't turn out to be. It should be very rare to have a false positive on finding a 'fake' frame, but in your backup it happens a lot. I was a bit afraid that they were actually valid frames, but the decrypting parameters were somehow bad (wrong framecount) causing the decrypted data to be corrupted (I still plan on testing this thoroughly, just to be safe). However, even if this was the case, I would expect to find at most two such frames in a row (of predictable size) before attachment data was encountered and the hash check fails ('bad mac'). Since the backup file is written in a known order, I know at that point in the file, the decrypted data represents a sequence of (1) a SqlStatementFrame (of around 400-700 bytes) which adds an entry in the 'part' table, (2) an AttachmentFrame (~26 bytes) which holds some info on the attachment, (3) the attachment's actual data (this sequence can be seen in your logs leading up to the bad data in frame 26198). Both frames (all frames in the file in fact), are preceded by 4 unencrypted bytes which hold the length of the frame, but the attachment data is not. The only way to know how much data to read at that point is to retrieve the size of the attachment from the decrypted AttachmentFrame. So, that is why it seems unlikely to me that the data read by my program is actually valid backup data that is decrypted incorrectly. The sizes of the frames found do not seem correct at all, and many invalid frames are read back-to-back without any apparent attachment data in between. I just still don't know if the 'fake' frames are found because of a bug in my program (I can't reproduce it) or wether something went wrong when Signal wrote the backup (maybe something to do with the lack of space somehow?).

I was looking back at what you said earlier:

I tested the file initially with signal-back check and it responded with Backup looks okay from here.

And was sort of hoping signal-back could help in finding out wether there is a problem in my program or not. But I just did a check with about 5 of my broken backups, ones I know are broken because I corrupted them manually at various specific point in the file. One of them I simply replaced the last 50% of the file with random bytes. But, signal-back responds okay to all of them. Maybe the check function only checks the first few frames to test wether the password is correct (though I don't understand why it takes so long and so much memory in that case).

Nope - there is a desktop app which was hooked up with that account till the end (but not from the start)

That is probably also good. To be able to generate some working thread table, I used to be able to read the phone numbers from the 'sms' and 'mms' tables, as they were used as identifiers. However, these days, recipients are identified by an uuid primarily, which is stored in the 'recipient' table, which is also empty. The messages and threads only refer to a recipient which (I think) needs this unique identifier. Luckily these identifiers are also in the desktop app's database, which my program can also read.

Anyway, no news to report right now. I'll try to work on generating a valid thread table and doing a final check on your backup file this weekend. I just wanted to let you know you could use signal in the meantime if you trust the merging ability of my program. In fact, it may be relatively easy to import messages from the incomplete backup into the threads of your new signal installation.

bepaald commented 3 years ago

Ok, I added a temporary function to check one last time wether all those invalid frames found are really junk data. It will seek to a position in the file where such a frame was found, and then decode it with the decoding parameters of the first million possible frames. Just run like this: ./signalbackup-tools [input] [passwrd] --hhenkel. I do not expect a crazy amount of output. (this was a quick hack by the way, I hope I didn't make any mistakes. It won't eat your computer, everything is read-only, but I'd hate to waste our time)

I'll have more time the next 2 days, I'll try to start on generating a functional backup from the truncated version you have. Do you happen to know wether all the conversations in your backup exist in your desktop install? Or had some not had any activity since linking the desktop?

Thanks!

hhenkel commented 3 years ago

@bepaald just ran the new version with the parameter you provided. I'll send you the output via mail but it seemed not that "interesting".

bepaald commented 3 years ago

Thanks! That was unexpected but actually very interesting. It explains everything I'm seeing.

As I already mentioned, the order of the backup is always set. The first thing that always happens is filling the 'sms' table, so when I saw that frame would decode to an 'insert into sms'-statement, when decoded with a very low frame number everything fell into place.

Somehow, the last 1/3 of your backup - probably right from where the corrupted attachment is, in frame 26198 - is a duplicate of the first part of the backup. So, after seeing that last log you sent me, I started looking at the invalid frames found after the corrupted attachment and comparing them to the start of the file:

The invalid frames:

signalbackup-tools-verbose-11-11-2020-Getting frame at filepos: 2148732075
signalbackup-tools-verbose-11-11-2020-Framelength: 367
signalbackup-tools-verbose-11-11-2020-Calculated mac: (hex:) 06 48 7c 10 7a 28 88 f5 a4 21 af 5d 73 6e 70 ff 16 00 8b 23 cb ee 36 09 f4 b7 d7 46 bc fa 1b 62
signalbackup-tools-verbose-11-11-2020-Mac in file   : (hex:) 06 48 7c 10 7a 28 88 f5 a4 21
signalbackup-tools-verbose-11-11-2020-Failed to get valid frame from decoded data...
[...]
signalbackup-tools-verbose-11-11-2020-Getting frame at filepos: 2148732446
signalbackup-tools-verbose-11-11-2020-Framelength: 387
signalbackup-tools-verbose-11-11-2020:Calculated mac: (hex:) 74 65 e9 ed ef 6f e7 a6 cb 35 85 55 d7 4c 26 e6 71 f2 e0 03 84 40 b9 91 25 2d f8 8b 7f ea b0 96
signalbackup-tools-verbose-11-11-2020:Mac in file   : (hex:) 74 65 e9 ed ef 6f e7 a6 cb 35
signalbackup-tools-verbose-11-11-2020-Failed to get valid frame from decoded data...

And the beginning (which decode properly):

signalbackup-tools-verbose-11-11-2020-Getting frame at filepos: 68779
signalbackup-tools-verbose-11-11-2020-Framelength: 367
signalbackup-tools-verbose-11-11-2020-Calculated mac: (hex:) 06 48 7c 10 7a 28 88 f5 a4 21 af 5d 73 6e 70 ff 16 00 8b 23 cb ee 36 09 f4 b7 d7 46 bc fa 1b 62
signalbackup-tools-verbose-11-11-2020-Mac in file   : (hex:) 06 48 7c 10 7a 28 88 f5 a4 21
signalbackup-tools-verbose-11-11-2020-FRAME 252 (000.0%)... 
signalbackup-tools-verbose-11-11-2020-Getting frame at filepos: 69150
signalbackup-tools-verbose-11-11-2020-Framelength: 387
signalbackup-tools-verbose-11-11-2020:Calculated mac: (hex:) 74 65 e9 ed ef 6f e7 a6 cb 35 85 55 d7 4c 26 e6 71 f2 e0 03 84 40 b9 91 25 2d f8 8b 7f ea b0 96
signalbackup-tools-verbose-11-11-2020:Mac in file   : (hex:) 74 65 e9 ed ef 6f e7 a6 cb 35

Obviously, the last (undecryptable) part of the backup is just a duplicate of the beginning. And Signal did not start re-exporting the earlier frames during the backup process, then the counter would have incremented, it is really a copy. My program does not find these frames because the counter can only ever increase, never decrease, so it only searches forward for the correct framenumber.

How or why your backup ended up like this I don't know, but at least I'm as certain as I'm going to be (without having the backup here) that:

So thanks for sending me that! I'll get to work on generating a thread table for your backup tomorrow.

bepaald commented 3 years ago

Hi! Sorry for the slow progress lately, I ran into some difficulties while investigating the Signal Desktop database, then work kept me busy. The good news is I've figured out the desktop database format (I hope - maybe more difficulties will come up later), and I hope to get something implemented in the next few days.

In the meantime, there are some possible issues when filling the missing data from the Signal Desktop database. Some of which could be fixed with more coding, but if I don't have to write code for that, I'd rather not. I believe normal (unsecured) sms messages are not synced to the Desktop? Do you use Signal as your sms app, and do you have any non-signal threads in the backup? Do you think there are any conversations in the backup that do not exist (at all) on the Desktop, for example in conversations that have had no activity since linking the desktop?

I have a work-in-progress function that matches the threads in the backup to specific recipients by comparing to the desktop database. Maybe you could try it? It will print out a list of threads, and then for each thread, it will either print the contact (matched from Signal Desktop) or it will print the last ten messages so you could maybe figure out what thread it is. I don't need the full output, but maybe you could tell me how many threads are not matched and if you think they exist in the desktop database or not. And, if you have already started using a fresh Signal install on phone, do the unmatched threads exist there (or could they)?

example:

./signalbackup-tools [fixed-backupfile.backup] [password] --hhenkel ~/.config/Signal/
signalbackup-tools (./signalbackup-tools) source version 20201116.131421 (OpenSSL)
[...]
Reading backup file...
FRAME 1089 (100.0%)... Read entire backup file...

done!
-------------
| thread_id |
-------------
| 9         |
| 11        |
| 13        |
| 15        |
-------------

 - Got match for thread 9:
---------------------------------------------------------------------
| name         | profileName  | profileFamilyName | profileFullName |
---------------------------------------------------------------------
| Devphone Red | Devphone Red | (NULL)            | Devphone Red    |
---------------------------------------------------------------------

 - Failed to match thread 11 to any conversation in Signal Desktop database
   Last 10 messages from this thread:
------------------------------------------------------------------------------------------
| union_date    | union_display_date | union_type | union_body                           |
------------------------------------------------------------------------------------------
| 1592997779008 | 1592997778650      | 10616852   | ChBGa5en1B+Z6 [...] MTY4MzU3MDA3Mg== |
| 1592997764729 | 1592997764572      | 10485783   | Bye                                  |
| 1592997762875 | 1592997762676      | 10485783   | Ok                                   |
| 1592997755639 | 1592997754756      | 10485780   | I'm leaving                          |
| 1592997754478 | 1592997751782      | 10485780   | Done!                                |
| 1592997753636 | 1592997748517      | 10485780   | Lots of text                         |
| 1592997745505 | 1592997744282      | 10485780   | Lots of text                         |
| 1592997717373 | 1592997715030      | 10485780   | (NULL)                               |
| 1592997672201 | 1592997671282      | 10485780   | OK, here goes                        |
| 1592997660910 | 1592997660878      | 10485783   | Bring it on                          |
------------------------------------------------------------------------------------------

 - Got match for thread 13
-----------------------------------------------------------------
| name      | profileName | profileFamilyName | profileFullName |
-----------------------------------------------------------------
| devgroup4 | (NULL)      | (NULL)            | (NULL)          |
-----------------------------------------------------------------

 - Got match for thread 15
-----------------------------------------------------------------
| name   | profileName    | profileFamilyName | profileFullName |
-----------------------------------------------------------------
| (NULL) | Devphone Black | (NULL)            | Devphone Black  |
-----------------------------------------------------------------

The parameter after the hhenkel option is the directory where the 'config.json' file is in the Signal Desktop installation, for Linux it should be '$HOME/.config/Signal/' as far as I know.

bepaald commented 3 years ago

~Ok, I'm having some trouble with my dev-phones so I can't test very well at the moment. The code is in a state to test. Same command as in previous message, but with an --output arg to export the new backup. It should still only add the threads which it can match to threads in Signal Desktop. Other threads will not show up, but let me know if you have those and know the corresponding phone number, I could provide an option to add those manually. For threads that aren't matched to the desktop database that are group-threads, I don't think there is another option than to get from a new android database (after being readded to the group, or just exchanging a few messages). I'm curious to the results.~

Just thought of something important still missing. I still need to translate the group members from the desktop db to the android database. So this is still a work in progress... Sorry.

jkirk commented 3 years ago

Hey! I also have problems restoring the Signal backup and stumbled over this issue. I /think/ I was hit by signalapp/Signal-Android#11076 and/or signalapp/Signal-Android#8355. So I tried your signalbackup-tools, ended up with this:

❯ docker run -it --rm -v "$PWD:$PWD" -w "$PWD" signalbackuptools:latest signal-2021-03-18-12-52-32.backup [SNIP] --output signal-2021-03-18-12-52-32-fixed.backup --opassword [SNIP]
signalbackup-tools (signalbackup-tools) source version 20210315.184303 (OpenSSL)
IV: (hex:) bd d1 bf 70 9a f4 50 ba 95 00 fa d8 c2 d6 ad de (size: 16)
SALT: (hex:) 6b b8 30 72 6f e6 e8 93 a4 8b 72 f5 81 08 36 44 f0 66 ee 64 42 7c 0a 32 0e 26 2d 76 3c 1b a1 c6 (size: 32)
BACKUPKEY: (hex:) 45 da c6 1d 8a 92 e6 46 8d f3 fe 56 bb 8b ef 3c f1 95 74 6a e0 b4 89 d2 91 0e ae fa cc fb 4a f1 (size: 32)
CIPHERKEY: (hex:) ad 15 a1 06 42 e6 8e a0 ac 05 fd a3 35 f3 7f 18 44 22 b1 27 a7 47 4f 05 00 05 4c 50 c4 aa 89 71 (size: 32)
MACKEY: (hex:) 50 94 84 a7 be 22 15 4f 68 30 97 a4 ba 68 08 57 0b 18 47 a3 dd 6a eb 3a e1 47 8a ec 75 d8 a8 36 (size: 32)
COUNTER: 3184639856
Reading backup file...
FRAME 42236 (098.7%)... 
WARNING: Bad MAC in attachmentdata: theirMac: (hex:) 33 61 e3 d1 8f 84 0c 86 17 7f
                                      ourMac: (hex:) 95 db 7f 18 6a 62 95 07 22 d2 43 a2 c8 37 9c f1 ee 8d 86 99 3a 75 a5 64 2b 5d 15 a2 9d 5b f2 28

WARNING: Bad MAC in frame, trying to print frame info:
Frame number: 42237
        Size: 15
        Type: ATTACHMENT
         - row id          : 1800 (8 bytes)
         - attachment id   : 1612985179178 (8 bytes)
         - length          : 2283365 (8 bytes)
         - attachment      : (hex:) ff d8 ff e0 00 10 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 ff db 00 43 00 ... (2283365 bytes total)
Frame is attachment, it belongs to entry in the 'part' table of the database:
 - _id : 1800
 - mid : 1633
 - seq : 0
 - ct : image/jpeg
 - name : (NULL)
 - chset : (NULL)
 - cd : +Vjogk9WxzlQa0LssIAypyaefSr21xGrPjKVka7Z75ndcVxErTDFR+wl9GJUbZDeOXKpzUfa7rKy/kzdPVudbA==
 - fn : (NULL)
 - cid : (NULL)
 - cl : derZeXgOE0pD3fT65hhL
 - ctt_s : (NULL)
 - ctt_t : (NULL)
 - encrypted : (NULL)
 - pending_push : 0
 - _data : /data/user/0/org.thoughtcrime.securesms/app_parts/part1951904276591455817.mms
 - data_size : 2283365
 - file_name : (NULL)
 - thumbnail : (NULL)
 - aspect_ratio : (NULL)
 - unique_id : 1612985179178
 - digest : (hex:) 5f 3f 81 4b c8 e3 8f e3 5c 7d 31 07 6c df fa 41 6e 04 96 87 ca 14 e0 6b 7f 3f 1f 73 80 53 02 a1
 - fast_preflight_id : -8899131470186214802
 - voice_note : 0
 - data_random : (hex:) ce cf 8a ba 1a fd 4a 1c 12 07 c0 50 41 06 2d 6e 69 86 df 2f 83 11 4a 7b 45 73 26 b2 0e 10 fb c5
 - thumbnail_random : (NULL)
 - quote : 0
 - width : 3024
 - height : 4032
 - caption : (NULL)
 - sticker_pack_id : (NULL)
 - sticker_pack_key : (NULL)
 - sticker_id : -1
 - data_hash : Q9Oy5V/ZwxUMFgpRWEzLxr6w+629l6CAjwh4zxTMzCM=
 - blur_hash : LdG+j;R*M_RiWBWBM{Rj_4WAM{az
 - transform_properties : {"skipTransform":true,"videoTrim":false,"videoTrimStartTimeUs":0,"videoTrimEndTimeUs":0,"videoEdited":false}
 - transfer_file : (NULL)
 - display_order : 2
 - upload_timestamp : 1612985184149
 - cdn_number : 2
 - borderless : 0
 - sticker_emoji : (NULL)

Which belongs to entry in 'mms' table:
 - _id : 1633
 - thread_id : 1
 - date : 2021-02-10 19:26:19 +0000 (1612985179782)
 - date_received : 2021-02-10 19:26:19 +0000 (1612985179792)
 - msg_box : 10485783
 - read : 1
 - m_id : (NULL)
 - sub : (NULL)
 - sub_cs : (NULL)
 - body : 
 - part_count : 0
 - ct_t : (NULL)
 - ct_l : (NULL)
 - address : 43
 - address_device_id : (NULL)
 - exp : (NULL)
 - m_cls : (NULL)
 - m_type : 128
 - v : (NULL)
 - m_size : (NULL)
 - pri : (NULL)
 - rr : (NULL)
 - rpt_a : (NULL)
 - resp_st : (NULL)
 - st : (NULL)
 - tr_id : (NULL)
 - retr_st : (NULL)
 - retr_txt : (NULL)
 - retr_txt_cs : (NULL)
 - read_status : (NULL)
 - ct_cls : (NULL)
 - resp_txt : (NULL)
 - d_tm : (NULL)
 - delivery_receipt_count : 1
 - mismatched_identities : (NULL)
 - network_failures : (NULL)
 - d_rpt : (NULL)
 - subscription_id : -1
 - expires_in : 0
 - expire_started : 0
 - notified : 0
 - read_receipt_count : 1
 - quote_id : 0
 - quote_author : (NULL)
 - quote_body : (NULL)
 - quote_attachment : -1
 - quote_missing : 0
 - shared_contacts : (NULL)
 - unidentified : 1
 - previews : (NULL)
 - reveal_duration : 0
 - reveal_start_time : 0
 - reactions : (NULL)
 - reactions_unread : 0
 - reactions_last_seen : -1
 - date_server : -1
 - remote_deleted : 0
 - quote_mentions : (NULL)
 - mentions_self : 0
 - notified_timestamp : 0
 - viewed_receipt_count : 0
Trying to dump decoded attachment to file 'attachment_1633.bin'
FRAME 42237 (098.8%)... Failed to read next frame (4059453099 bytes at filepos 2149138374)
Starting bruteforcing offset to next valid frame... starting after: 2149138374
[...]
Checking offset 102210 bytes
GOT GOOD MAC AT OFFSET 102213 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 10221 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 102510 bytes
GOT GOOD MAC AT OFFSET 102516 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 2903 frames... nope! :(
[...]
Attachment data with BAD MAC was encountered:
Short info on message to which attachment with bad mac belongs (1/1):
Date          : 2021-02-10 19:26:19 +0000 (1612985179782)
Date received : 2021-02-10 19:26:19 +0000 (1612985179792)
Sent to       : 
Message body  : 

done!
WARNING EndFrame was not read: backup is probably incomplete

Exporting backup to 'signal-2021-03-18-12-52-32-fixed.backup'
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing SqlStatementFrame(s)...
  Dealing with table 'sms'... 37265/37265 entries...done
  Dealing with table 'mms'... 1546/1546 entries...done
  Dealing with table 'part'... 131/1681 entries...Warning: attachment data not found (rowid: 211, uniqueid: 1577459681020)
  Dealing with table 'part'... 132/1681 entries...Warning: attachment data not found (rowid: 212, uniqueid: 1577460902742)
  Dealing with table 'part'... 1681/1681 entries...done
  Dealing with table 'thread'... 0/0 entries...
  Dealing with table 'identities'... 0/0 entries...
  Dealing with table 'drafts'... 0/0 entries...
  Dealing with table 'push'... 0/0 entries...
  Dealing with table 'groups'... 0/0 entries...
  Dealing with table 'group_receipts'... 0/0 entries...
  Dealing with table 'sticker'... 0/0 entries...
  Dealing with table 'recipient'... 0/0 entries...
  Dealing with table 'storage_key'... 0/0 entries...
  Dealing with table 'remapped_recipients'... 0/0 entries...
  Dealing with table 'remapped_threads'... 0/0 entries...
  Dealing with table 'mention'... 0/0 entries...
Writing SharedPrefFrame(s)...
Writing Avatars...
Writing EndFrame...
Done!

Transferred the -fixed file to the new phone and tried to restore the backup. But Signal stated that the the passphrase is wrong. Maybe noteworthy: the date stamp of the -fixed file shows: "1970-01-01 00:59".

So this is what happened:

That is why I ended up here. As I am running Debian/buster I opted for the docker image to build signalbackup-tools: https://gitlab.com/splatops/cntn-signalbackup-tools.

Any idea what to do next? Thank you very much!

(I have access to the first phone and can create backup any time.)

bepaald commented 3 years ago

Transferred the -fixed file to the new phone and tried to restore the backup. But Signal stated that the the passphrase is wrong. Maybe noteworthy: the date stamp of the -fixed file shows: "1970-01-01 00:59".

So this is what happened:

* Samsung Galaxy S8 (with Android 8.0.0) with Signal 5.4.12 -> took Signal Backup

* Transferred the backup files to my Notebook via `adb pull` (running Debian/buster)

* Transferred the backup to the Xiaomi Pocophone F1 (with Android 10) via `adb push`

* Installed Signal, gave permissions access files/folders, but at first it could not find/detect the backup files

* So, I registered Signal without restoring the backup and took a backup. There I was asked to choose the backup location. I choose the "default" location `Signal/Backups` (where I already had my 'real' backup). See: https://support.signal.org/hc/en-us/articles/360007059752-Backup-and-Restore-Messages#android_help

* Uninstalled Signal and tried again to restore the backup. Half way through Signal crashed and could not recover the backup

Ok, so I think it did not detect the backup because you had left the '-fixed' in the filename? Signal requires backups to be named 'signal-YYYY-mm-dd-HH-MM-SS.backup', nothing else will work. And if multiple files matching this pattern are found, the newest time-string is used. The actual timestamp of the file (eg 1970-01-01) should not matter. I think the incorrect timestamp is a docker thing by the way, but shouldn't be important.

Is it possible that you were restoring the broken backup when signal crashed? Or possibly the new one you just created (which should have had a later date)? Also, I don't see the wrong passphrase event in your list, when did this happen? Again, is it possible Signal was picking up a different backup file then the one whose passphrase you entered?

Any idea what to do next? Thank you very much!

(I have access to the first phone and can create backup any time.)

That last part is very good! I think you're going to need it :).

From the output of my program I can see the backup is indeed broken and no data can be recovered after the break. Fortunately it looks like the break happens very late in the file, so all of your messages and most (if not all) attachments are still there. The bad news is, the signal backup file has the 'thread'-table at the end and that one is missing completely (Dealing with table 'thread'... 0/0 entries...). So if you manage to restore the backup, you will be presented with a seemingly empty app: all the messages are actually there, but there are no conversations to select through which you can access them.

What I would do next:

Let me know if it works. Good luck!