bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
792 stars 38 forks source link

Sucessfully fixed corrupted backup, but all messages merged into single thread and marked as sent by wrong people. #32

Closed sleepyh34d closed 3 years ago

sleepyh34d commented 3 years ago

Ignore this first part. I manged to solve this on my own. See second post below.

I keep having issues trying to restore my backup in Signal, saying it's the wrong passphrase, but I don't think it's wrong. To be fair, my backup file is nearly 5GB, and there's a possibility of corruption. I'm trying to get the messages out to restore to my phone without all the media. I would like to get both, but messages are the priority. I get the following error when trying to read my backup file:

Reading backup file...
FRAME 85889 (099.9%)...  STOPPING BEFORE END OF ATTACHMENT!!! (EOF)
Failed to get attachment data for FrameWithAttachment... info:
Frame number: 85890
        Size: 15
        Type: ATTACHMENT
         - row id          : 7519 (8 bytes)
         - attachment id   : 1617254927478 (8 bytes)
         - length          : 5148917 (8 bytes)

done!
WARNING EndFrame was not read: backup is probably incomplete

I also ran --listframes after and it's been running for several minutes with stuff along the lines of the following:

Reading backup file...
FRAME 85890 (100.0%)... Failed to read 4 bytes from file to get next frame size... (-1 / 4294967295)
Failed to read next frame (0 bytes at filepos 4295633025)
Starting bruteforcing offset to next valid frame... starting after: 665729
Checking offset 270 bytes
GOT GOOD MAC AT OFFSET 277 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 27 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 580 bytes
GOT GOOD MAC AT OFFSET 589 BYTES!
Now let's try and find out how many frames we skipped to get here....
Checking if we skipped 58 frames... nope! :(
No valid frame found at maximum frameskip for this offset...
Checking offset 860 bytes
GOT GOOD MAC AT OFFSET 861 BYTES!

The numbers for bytes and frames is different each time, but it's quite a lot. Could this mean my backup is very corrupted, or that the backup password might just actually be wrong?

sleepyh34d commented 3 years ago

Apologies for this post. I messed around for a little longer and figured out what was going wrong. It looks like my backup file has some corruptions in it, which I was able to work around and import into a new backup file. I successfully imported it into Signal, buuut then I noticed that everything was dumped into one single thread with all the messages received marked as coming from the wrong sender. Not sure how to get it working. I used

.\signalbackup-tools_win.exe signal-2021-04-10-21-30-20.backup "[passphrase]" --output signal-hacked-bu.backup --opassword "[passphrase]" which is how I got this result. Can anyone advise what I may have done wrong, or a better command I could use to make sure everything is moved correctly?

Edit: I managed to export my messages to an xml file to inspect them, and all the contact names say "null" which matches a closed issue that should have been fixed. I'm using the latest windows executable, and the backups were made with the latest version of Signal. Something of note: I have a backup from Feb 2021 that I worked on, and that one, when exported to xml, shows both the address and contact names of the messages. In the Apr 2021 backup, they are as follows: address="" contact_name="null"

bepaald commented 3 years ago

Well, your backup file seems to be truncated somehow: WARNING EndFrame was not read: backup is probably incomplete (also note the 'EOF' (end-of-file) while trying to read an attachment). Do you still have the original phone where the backup was made? Can you verify the filesize (or, better yet, the actual data with some hash like md5)?

In fact, you say the filesize is 5Gb, but it starts failing at 4294967295 bytes, that is exactly 4Gb, the maximum filesize on a FAT32 partition. Did you use a FAT32 formatted drive somewhere when transferring? I'm 99% certain that has truncated the last GB off of your file. If you still have the original on the phone, transferring it without cutting off the last part will undoubtedly fix all your issues.

I also ran --listframes after and it's been running for several minutes with stuff along the lines of the following: The numbers for bytes and frames is different each time, but it's quite a lot. Could this mean my backup is very corrupted, or that the backup password might just actually be wrong?

There is no listframes, did you mean listthreads? Well, the program is trying to fix the corruption it has found here, it really shouldn't do that because the end of the file was reached, so I have a little bug to fix :) Either way, since the file is truncated there is no data to recover. Your backup is in a sense 'very corrupted' indeed, and your passphrase is definitely not wrong.

everything was dumped into one single thread with all the messages received marked as coming from the wrong sender all the contact names say "null"

Bad news is, while the messages and attachments are in the beginning of the file, all information relating to the messages (the threads they belong to, the recipients they are from/to, groups, identities, etc...) are at the end of the backup file, and - in your case - missing. I really hope you still have your original backup file. If not, please report back.

Other people have been in the same situation as you (though usually from unrecoverable corruption instead of truncation) and I have in the past been able to rebuild a working the thread-table, but since the new database format this is not possible anymore. I have thought of other ways to get messages back, but so far they have proven to be difficult to implement. Currrently, my best idea is to actually start with a new, empty installation, and then import the messages from the broken backup into the new backup, manually specifying the correct threads. But in your case, seeing as you have an older, working backup things might be a little easier (we will just have to import the newest messages into your Feb 2021 backup, I assume most if not all threads from the Apr backup exist in the Feb backup). But it will take a little bit of time, the required functionality is not yet implemented.

sleepyh34d commented 3 years ago

In fact, you say the filesize is 5Gb, but it starts failing at 4294967295 bytes, that is exactly 4Gb, the maximum filesize on a FAT32 partition. Did you use a FAT32 formatted drive somewhere when transferring? I'm 99% certain that has truncated the last GB off of your file. If you still have the original on the phone, transferring it without cutting off the last part will undoubtedly fix all your issues.

So umm... yes. It turns out my SD card, which is where my primary backups were being made, is formatted as FAT32. I let android format it, and never really thought to check what it did, because why would it use FAT32 and not exFAT for a 128GB SD card??? My own damn fault for never confirming. Wow that sucks. It was saving directly to the SD card, in case my phone ever died, I could just pull it. Signal never notified me that the backups were failing or hitting a size limit, either. They just kept chugging along.

I can use the backup I have from Feb, but that's about 3GB large, so I'm missing a whole lot between then and now, unfortunately. I really just want the messages from the April backup though, the attachments are much less of a priority. The XML export I have has the last text received before the backup, but it's just missing the identify information, which you mentioned is in the missing part of my file. It's 16.9MB of just text. Over 67000 lines, and about 15000 new ones compared to the Feb export that does have the contact information. On the bright side, I only really message less than a dozen people so it shouldn't be too hard to figure out who is who, and potentially manually fix it. I can see the format from the Feb backup, so maybe isolating the messages by person, and then Find and Replace the necessary info would work. Then merge it with the uncorrupted backup, and the new backup I have running since the reset.

All of this happened because I snoozed my Signal notifications for 2 hours one night, and they never unsnoozed, even when I disabled notification snoozing. Figured a reinstall would fix it, but here we are! At least now I know to reformat my sd card so this doesn't happen again.

EDIT: Question, does it matter if messages are kept in order by unix time? Or can they be out of order and will fix themselves automatically upon being imported? I realized that if I separate by person, putting them back in order by time will be quite an ordeal in itself.

bepaald commented 3 years ago

How are you planning to fix this? Exporting the XML is a one-way process, that is, you can edit those xml files together all you wish, but I do not believe it is possible to import them again (or does the android app still offer that functionality?).

Also note that, since the export xml function was made to be identical to the function the android app used to have (which was supposed to be compatible with some semi-standard sms backup format), some data is missing from the xml: only the 'sms' table is exported, but all messages from the 'mms' table are left out. These are all outgoing group messages, as well as messages with an attachment (not just the attachment, but the entire message).

Anyway, I think since you have a fairly recent working backup, it will be fairly easy for me to write a merge-function for you. I had planned on doing that today, but when preparing, I noticed some large changes in the backup file that have occurred just since the last update. So I had to give that priority and have just now pushed a rather large update to support the changes. I am now absolutely sick of coding, but luckily I have a day off work tomorrow and am now prepared (I truncated one of my own backups for testing) to work on your problem. So, obviously no hard promises, but hopefully I'll have your problem sorted tomorrow.

Just a few small things that will remain impossible/difficult to fix (that I can think of right now):

EDIT: Question, does it matter if messages are kept in order by unix time? Or can they be out of order and will fix themselves automatically upon being imported? I realized that if I separate by person, putting them back in order by time will be quite an ordeal in itself.

I don't think so, but I'm not totally sure. When the app needs to retrieve messages, it usually explicitly sorts by date. However there are places where the app sorts by '_id' (an internal identifier which does not occur in the xml). If messages are out of order (and the _id is auto-incremented on import, as it will be when it's missing), then as far as I know at least the 'media overview' will show the media in the wrong order or not at all. There may be more issues I do not know about.

bepaald commented 3 years ago

Ok, I whipped something up. I tested it as well as I could without actually importing it on a phone. Run by calling signalbackup-tools [completebackup] [pwd] --sleepyh34d [truncatedbackup],[pwd] -o [output].

Note the comma (,) in between the truncated backup filename and the corresponding password, no space there. Please check the resulting file carefully. I'd be interested in hearing your results. Also feel free to post the output of the command, an example (also showing the warning about 'mentions' being found):

[~/programming/signalbackup-tools] $ ./signalbackup-tools ~/PHONE/signal-2021-04-10-16-00-16.backup 949023591455536854942544534240 --sleepyh34d signal-2021-04-10-16-00-16.backup.trunc,949023591455536854942544534240 -o FIXED
signalbackup-tools (./signalbackup-tools) source version 20210412.194049 (OpenSSL)
IV: (hex:) 99 a9 00 0a 1f b0 04 21 bc 81 e8 94 d9 00 e2 3d (size: 16)
SALT: (hex:) 2c b9 ad 9b 27 94 b5 fd 40 fb e2 8c 86 76 bd 02 18 52 af 65 da 8b 94 d8 7e 8d a2 a6 59 8d 89 41 (size: 32)
BACKUPKEY: (hex:) ed c4 4c 5b fc 5f b6 1a 21 1e 38 1c d6 99 ed 30 16 25 4e 13 4a 62 19 13 84 6c 5a 78 71 55 02 93 (size: 32)
CIPHERKEY: (hex:) a5 96 b0 df 28 9b ad 89 0b 7c 02 c5 e5 1f 5f 70 bf 76 31 73 2a 05 b1 e2 81 b0 01 6a 19 29 e1 3d (size: 32)
MACKEY: (hex:) 9b 9c 2b ec c1 7d c9 7b 66 51 77 65 3c d4 3c 86 43 53 8f 0e f8 25 f6 d0 5c 0a af 93 92 e7 e7 3b (size: 32)
COUNTER: 20042469
Reading backup file...
FRAME 69278 (100.0%)... Read entire backup file...

done!
Opening truncated backup...
IV: (hex:) 99 a9 00 0a 1f b0 04 21 bc 81 e8 94 d9 00 e2 3d (size: 16)
SALT: (hex:) 2c b9 ad 9b 27 94 b5 fd 40 fb e2 8c 86 76 bd 02 18 52 af 65 da 8b 94 d8 7e 8d a2 a6 59 8d 89 41 (size: 32)
BACKUPKEY: (hex:) ed c4 4c 5b fc 5f b6 1a 21 1e 38 1c d6 99 ed 30 16 25 4e 13 4a 62 19 13 84 6c 5a 78 71 55 02 93 (size: 32)
CIPHERKEY: (hex:) a5 96 b0 df 28 9b ad 89 0b 7c 02 c5 e5 1f 5f 70 bf 76 31 73 2a 05 b1 e2 81 b0 01 6a 19 29 e1 3d (size: 32)
MACKEY: (hex:) 9b 9c 2b ec c1 7d c9 7b 66 51 77 65 3c d4 3c 86 43 53 8f 0e f8 25 f6 d0 5c 0a af 93 92 e7 e7 3b (size: 32)
IV: (hex:) 7c 99 59 77 74 9f 47 30 88 86 4f 1c a5 8f 54 e7 (size: 16)
COUNTER: 20042469
Reading backup file...
FRAME 64826 (099.4%)... ERROR Unexpectedly hit end of file while reading attachment!

done!
WARNING EndFrame was not read: backup is probably incomplete
Deleting sms/mms tables from complete backup
Cleaning up part table/attachments...
Importing sms entries from truncated file  53761 entries.
Importing mms entries from truncated file  4812 entries.
Importing part entries from truncated file  3093 entries.
WARNING Mentions found! Probably a good idea to check these messages:
 -  Group: devgroup
    Date : 2021-01-21 16:11:04
 -  Group: devgroup
    Date : 2021-01-21 16:11:21
 -  Group: devgroup
    Date : 2021-01-21 21:06:37
updateThreadsEntries
  Dealing with thread id: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41

Exporting backup to 'FIXED'
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing SqlStatementFrame(s)...
  Dealing with table 'sms'... 53761/53761 entries...done
  Dealing with table 'mms'... 4812/4812 entries...done
  Dealing with table 'part'... 3093/3093 entries...done
  Dealing with table 'thread'... 36/36 entries...done
  Dealing with table 'identities'... 25/25 entries...done
  Dealing with table 'drafts'... 0/0 entries...
  Dealing with table 'push'... 0/0 entries...
  Dealing with table 'groups'... 23/23 entries...done
  Dealing with table 'group_receipts'... 3280/3280 entries...done
  Dealing with table 'sticker'... 56/56 entries...done
  Dealing with table 'recipient'... 93/93 entries...done
  Dealing with table 'storage_key'... 0/0 entries...
  Dealing with table 'remapped_recipients'... 0/0 entries...
  Dealing with table 'remapped_threads'... 0/0 entries...
  Dealing with table 'mention'... 4/4 entries...done
Writing SharedPrefFrame(s)...
Writing KeyValueFrame(s)...
Writing Avatars...
Writing EndFrame...
Done!
sleepyh34d commented 3 years ago

This is amazing! I gave it a try and got the following error. I verified that I used the correct passphrase, and it's 30 characters. In fact, it's the same for both the intact backup and the truncated backup, so I copy-pasted them at the same time.

./signalbackup-tools_win.exe signal-2021-02-21-06-33-45.backup 7160*****0785 --sleepyh34d signal-2021-04-10-21-30-20.backup,7160*****0785 -o signal-bu-merged.backup (censored key, obv, but it's 30 digits I swear)

Reading backup file...
FRAME 68187 (100.0%)... Read entire backup file...

done!
Opening truncated backup...
ERROR : Passphrase too short! Need 30 digits, 17 provided
Error: Failed to get backupkey from passphrase
Failed to create filedecrypter
Failed to read truncated backup file
Error during import
bepaald commented 3 years ago

That's weird, no idea what's going on. I'll start looking into it. In the meantime, if you read this and have time before I find out what's wrong, run again with the latest version. It won't fix anything, but I've just quickly updated the error message to hopefully give some useful info.

bepaald commented 3 years ago

Still don't know what is going wrong exactly, but as a quick (possible) fix I've just made the password optional if it's the same as the input password. So, /signalbackup-tools_win.exe signal-2021-02-21-06-33-45.backup 7160*****0785 --sleepyh34d signal-2021-04-10-21-30-20.backup -o signal-bu-merged.backup should do it for you. The password parsing function was already the same one, this forces the input to be the same as well, so I don't see how this could still go wrong. Let me know if it still doesn't work. (If you have some time to spare, you could still run the old way (with the comma and pwd added), just to let me know the error message)

sleepyh34d commented 3 years ago

I'll run it when I get home from work later tonight. I'll do both just so you can see what's going wrong with the first one. I really appreciate the help! :)

sleepyh34d commented 3 years ago

Here is the output with the passphrase after the comma:

Reading backup file...
FRAME 68187 (100.0%)... Read entire backup file...

done!
Opening truncated backup...
ERROR : Failed to parse passphrase from string '7.16076676542608E+29' : passphrase too short! Need 30 digits, 17 provided
Error: Failed to get backupkey from passphrase
Failed to create filedecrypter
Failed to read truncated backup file
Error during import

It's truncating the passphrase now!

Here's the [seemingly] successful result after your fix (btw make note of the passphrase. I adjusted the censor to show it matches part of the string from the last one):

./signalbackup-tools_win.exe signal-2021-02-21-06-33-45.backup 716076676542608*****5 --sleepyh34d signal-2021-04-10-21-30-20.backup -o signal-bu-merged.backup
signalbackup-tools (\signal bu\signalbackup-tools_win.exe) source version 20210413.213920 (OpenSSL)
IV: (hex:) ***** (size: 16)
SALT: (hex:) ***** (size: 32)
BACKUPKEY: (hex:) ***** (size: 32)
CIPHERKEY: (hex:) ***** (size: 32)
MACKEY: (hex:) ***** (size: 32)
COUNTER: 3056609393
Reading backup file...
FRAME 68187 (100.0%)... Read entire backup file...

done!
Opening truncated backup...
IV: (hex:) ***** (size: 16)
SALT: (hex:) ***** (size: 32)
BACKUPKEY: (hex:) ***** (size: 32)
CIPHERKEY: (hex:) ***** (size: 32)
MACKEY: (hex:) ***** (size: 32)
COUNTER: 2380155444
Reading backup file...
FRAME 85889 (099.9%)... ERROR Unexpectedly hit end of file while reading attachment!

done!
WARNING EndFrame was not read: backup is probably incomplete
Deleting sms/mms tables from complete backup
Cleaning up part table/attachments...
Importing sms entries from truncated file  67443 entries.
Importing mms entries from truncated file  6778 entries.
Importing part entries from truncated file  5797 entries.
WARNING: Found messages in thread not present in old (complete) database... dropping them!
WARNING: Found messages in thread not present in old (complete) database... dropping them!
updateThreadsEntries
  Dealing with thread id: 2, 4, 6, 8, 9, 10, 12, 13, 19, 20, 21, 23, 25, 30, 35, 43, 50, 72, 84, 86, 88, 90, 93, 97, 103, 104, 116, 119, 125, 126, 132, 135, 136, 143, 144, 159, 168, 185, 186, 198, 199, 201, 205, 234, 253, 265, 277, 278, 280, 285, 286, 288, 289, 298, 304, 310, 316, 321, 324, 325, 333, 337, 345, 348, 349, 354, 366, 368, 387, 394, 397, 407, 409, 410, 411, 412, 415, 416, 420, 431, 433, 437, 440, 445, 447, 448, 453, 460, 473, 500, 502, 507, 521, 526, 532, 539, 541, 544, 547, 549, 561, 565, 574, 579, 581, 583, 586, 587, 589, 591, 594, 597, 599, 601, 602, 603, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 5, 16, 15, 585

Exporting backup to 'signal-bu-merged.backup'
Writing HeaderFrame...
Writing DatabaseVersionFrame...
Writing SqlStatementFrame(s)...
  Dealing with table 'sms'... 66548/66548 entries...done
  Dealing with table 'mms'... 6712/6712 entries...done
  Dealing with table 'part'... 5797/5797 entries...done
  Dealing with table 'thread'... 130/130 entries...done
  Dealing with table 'identities'... 36/36 entries...done
  Dealing with table 'drafts'... 2/2 entries...done
  Dealing with table 'push'... 0/0 entries...
  Dealing with table 'groups'... 21/21 entries...done
  Dealing with table 'group_receipts'... 326/326 entries...done
  Dealing with table 'sticker'... 669/669 entries...done
  Dealing with table 'recipient'... 674/674 entries...done
  Dealing with table 'storage_key'... 0/0 entries...
  Dealing with table 'remapped_recipients'... 0/0 entries...
  Dealing with table 'remapped_threads'... 0/0 entries...
  Dealing with table 'mention'... 0/0 entries...
Writing SharedPrefFrame(s)...
Writing KeyValueFrame(s)...
Writing Avatars...
Writing EndFrame...
Done!

I'm going to try importing it to Signal to see what happens now. Would this file be safe to attempt to merge with a backup of my messages from the past few days? I imagine it would be essentially the same as a normal backup now, but figured I'd ask just in case.

Nevermind! I imported the fixed file, and it worked! Missing some more recent MMS messages, which were obviously stored past the cutoff, but besides that, everything seems to be back! I then made a signal backup of this, and was able to successfully merge the fixed backup and the backup I made with the messages from the past few days. Just finished importing it to signal for a final time, and everything seems to be good now!

One last thing I noticed, I started texting with someone new between the time from the intact Feb backup, to now. They're the only person whose messages didn't import. Not a big deal, but thought I'd mention it. I'm guessing since it's using the contact table from the intact backup to fix the truncated one, they wouldn't be there for it to be fixed.

bepaald commented 3 years ago

Here is the output with the passphrase after the comma: [...] It's truncating the passphrase now!

Well that's just mind baffling, it's turning the passphrase (which is a text string) into a number in scientific notation. I have no idea how or where that is happening. It doesn't do that here, I even fired up my Windows VM to check if it's a Windows thing, but I can't reproduce... I think I'll just ignore it for now, if I decide to keep this temporary function, I'll make it so the password is a separate switch (just like with all the other database inputs).

Here's the [seemingly] successful result after your fix (btw make note of the passphrase.

Cool!

I'm going to try importing it to Signal to see what happens now. Would this file be safe to attempt to merge with a backup of my messages from the past few days?

Yeah, if you use the normal --importthreads method for that merge I think that should work. This --sleepyh34d is only for this specific case in which the internal _id's are basically guaranteed not be in conflict and reference valid and correct data (because of the simple linear nature of these two backups and since _id's are always incrementing over time and never reused, even if messages are deleted). --importthreads will go through the entire database to change the _id's where necessary and make sure they still reference the correct data.

Nevermind! I imported the fixed file, and it worked! Missing some more recent MMS messages, which were obviously stored past the cutoff, but besides that, everything seems to be back! I then made a signal backup of this, and was able to successfully merge the fixed backup and the backup I made with the messages from the past few days. Just finished importing it to signal for a final time, and everything seems to be good now!

Excellent, I'm happy it worked. Output looks good, no weird errors or warnings, except WARNING: Found messages in thread not present in old (complete) database... dropping them! which are from your new contact (it's printed twice because the messages are found in both the 'sms' and 'mms' tables).

One last thing I noticed, I started texting with someone new between the time from the intact Feb backup, to now. They're the only person whose messages didn't import. Not a big deal, but thought I'd mention it. I'm guessing since it's using the contact table from the intact backup to fix the truncated one, they wouldn't be there for it to be fixed.

That is exactly right. If those messages were vitally important we could probably get them imported as well, but it'll be a lot more complicated. Let me know if you need this, but otherwise, if you're happy, I'm happy!