bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
787 stars 37 forks source link

Unable to restore backup - Hard Crash in latest version of Signal #124

Closed crimsoncor closed 1 year ago

crimsoncor commented 1 year ago

I don't think this is necessarily an issue with backuptools, but definitely is an issue with Signal that is being triggered by merging backups. I have backups from two phones that I'm attempting to combine and then restore. The individual backup files work fine, but the combined one seems to trigger a Foreign Key crash loop in Signal when it is restored. When I run

--runsqlquery "pragma foreign_key_check"

against the merged backup file, everything is reported as fine. However, in the signal crash log I'm seeing

06-11 17:03:19.709  3487  4297 E AndroidRuntime: FATAL EXCEPTION: AsyncTask #1
06-11 17:03:19.709  3487  4297 E AndroidRuntime: Process: org.thoughtcrime.securesms, PID: 3487
06-11 17:03:19.709  3487  4297 E AndroidRuntime: java.lang.RuntimeException: An error occurred while executing doInBackground()
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at android.os.AsyncTask$4.done(AsyncTask.java:415)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:381)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at java.util.concurrent.FutureTask.setException(FutureTask.java:250)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at java.util.concurrent.FutureTask.run(FutureTask.java:269)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:305)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at java.lang.Thread.run(Thread.java:1012)
06-11 17:03:19.709  3487  4297 E AndroidRuntime: Caused by: java.lang.IllegalStateException: Foreign Key Constraints cannot be enabled or disabled while there are transactions in progress.  Finish all transactions and release all active database connections first.
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at net.zetetic.database.sqlcipher.SQLiteConnectionPool.reconfigure(SQLiteConnectionPool.java:291)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at net.zetetic.database.sqlcipher.SQLiteDatabase.setForeignKeyConstraintsEnabled(SQLiteDatabase.java:2177)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at org.thoughtcrime.securesms.database.SignalDatabase$Companion.runPostBackupRestoreTasks(SignalDatabase.kt:300)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at org.thoughtcrime.securesms.database.SignalDatabase.runPostBackupRestoreTasks(Unknown Source:2)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at org.thoughtcrime.securesms.registration.fragments.RestoreBackupFragment$2.doInBackground(RestoreBackupFragment.java:294)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at org.thoughtcrime.securesms.registration.fragments.RestoreBackupFragment$2.doInBackground(RestoreBackupFragment.java:279)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at android.os.AsyncTask$3.call(AsyncTask.java:394)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
06-11 17:03:19.709  3487  4297 E AndroidRuntime:    ... 4 more
--------- beginning of system

I'm not fully sure how to go about debugging this, now. If I was seeing a foreign key issue with the db schema, there are other posts here in the gitlab issues that have pretty good details on fixing those sorts of issues. But without that info, I'm not really sure what to try next. Both backups are version 196 and both phones are running the latest version of Signal (6.21.3). I'm running the latest version of signalbackup-tools from the main branch

signalbackup-tools (signalbackup-tools) source version 20230611.140940 (OpenSSL)

If someone has any suggestions on how to better interrogate sqllite about what the issues might be, it would be much appreciated.

bepaald commented 1 year ago

Hi, sorry it's not working.

I'm not immediately sure what could cause this (never seen this error before). Reading the error, I'd guess there is a BEGIN TRANSACTION somewhere that isn't closed (committed or rolled back) before trying to set up the key constraints. But I'd think that would indicate a bug in Signal's backup restore process. I don't believe this program leaves any transactions open, but I just forced one to be open at output (I commented out the COMMIT) and it yields the same backup file (bit-identical), so I think transactions are automatically closed when closing the database and writing it out. Also, there is this recent bug report that has the same issue, which would also suggest a problem with current Signal. (Maybe their solution works for you as well? Looking at the solution, it might also be the same as this one).

The individual backup files work fine, but the combined one seems to trigger a Foreign Key crash loop in Signal when it is restored.

Then again, this would strongly suggest a bug in this program, unfortunately. So, I'm not sure at the moment. I'll need to think about it.

If someone has any suggestions on how to better interrogate sqllite about what the issues might be, it would be much appreciated.

Well, right now I'm very much undecided on whether there is anything wrong in the database at all. You could, just to be safe, run pragma_integrity_check (or use the --checkdbintegrity option) to check the database for more problems than just foreign key constraint violations, but I wouldn't expect any useful output.

Also, if you have the disk space and the time, you could use --exporthtml to get a hint as to whether the merge worked. Depending on the size and scope of the two backups it may be difficult to easily see though. And if it all looks good, that is hardly evidence the backup is good (the html export uses just a fraction of the data in the backupfile), but if the HTML looks very messed up, or fails completely, that would indicate a problem with the merge.

crimsoncor commented 1 year ago

wow thanks for looking into that. I'll definitely update that ticket in the Signal tracker since my issue matches it perfectly.

With an unmerged backup, Signal crashes immediately after completing the backup but then lets me add my phone number and authenticate if I go through the restore flow again. The behavior is exactly the same with the merged backup, except that with the merged backup, it enters a crash loop after authentication fails and the application never successfully starts.

The other thing I found, based on some other bug reports, was if I restore the backup in Airplane Mode, I do not get the initial crash. I can then turn off airplane mode to finish the activation flow. Unfortunately, even with this approach, the merged backup still causes a crash loop.

I did run the additional database checks on the merged backup and it did not show any issues. I will try dumping to html and see what happens. The merged isn't that different since one of the two files I'm combining only has a month worth of messages in it. My base backup is 716MB and the merged is 732MB, so hopefully the diff isn't too gnarly to work through.

I did notice one other interesting thing, which I haven't fully dug into but did make me go hmmm. I swap between different phones semi-regularly so I have a normal backup merge flow. To make my backup, I ran

signalbackup-tools oneplus2_signal-2023-06-11-17-36-38_cropped.backup  `cat oneplus2.key` --importthreads ALL --source pixel6a_signal-2023-06-11-06-09-25.backup --sourcepassword `cat atom.key` --output merged_2023-06-11-17-41-25.backup --opassword `cat atom.key`

That worked fine and yielded a backup file that reports no errors, though Signal will not work after loading. The initial file is a backup from my secondary phone that only covers the last month. The pixel6a file is the big 716MB main backup.

Just as an experiment, I tried swapping the order of the backups in the merge

signalbackup-tools pixel6a_signal-2023-06-11-06-09-25.backup  `cat atom.key` --importthreads ALL --source oneplus2_signal-2023-06-11-17-36-38_cropped.backup --sourcepassword `cat oneplus2.key` --output merged_2023-06-12-09-04-25.backup --opassword `cat atom.key`

and for some reason, that merge does fail with a foreign key error!

Checking foreign key constraints...
ERROR Foreign key constraint violated. This will not end well, aborting.
      Please report this error to the program author.
---------------------------------------
| table            | parent    | fkid |
---------------------------------------
| message          | recipient | 3    |
| reaction         | recipient | 0    |
| group_membership | recipient | 0    |
---------------------------------------
A fatal error occurred while trying to import thread 19. Aborting

I was going to try removing the record in question and seeing if I could get the merge to work in the opposite direction. I'm kinda puzzled why it the behavior is different based on the ordering of the backup files

crimsoncor commented 1 year ago

OKAY! That error was actually pretty relevant. As best I can tell, here is what happened. Thread 19 was a group message. In the oneplus2 backup, a new member had been added to that group. And that member had apparently reacted to a message in that thread. It appears like that member was not present in the pixel6a backup and that is what was causing the Foreign Key issue when reversing the merge order of the files.

I went back to the pixel 6a phone, sent a message to the group, and then did another backup. With that new backup, the foreign key error vanished when doing the reverse direction merge. And the resulting merged backup was successfully loaded in Signal (oh, it still crashed after doing the import, but redoing the restore flow a second time did not cause a crash loop).

So it was something, somehow, with signalbackup-tools due to the group mismatch between the two backup files. but I've got everything working now, so I'm pleased.

Leaving the ticket open for now, in case there is something there you want to dig into. But I seem to be 100% sorted at the moment.

bepaald commented 1 year ago

Thanks for investigating and your detailed report, it was very helpful.

I am still a little confused, let me get this straight:

There is currently a bug in Signal, that is causing the backup restore to seemingly fail, but by just restarting Signal the restore finishes successfully anyway.

Then, there are two bugs in this program:

  1. A new recipient in an existing group is not correctly imported, which leads to foreign_key errors on all the new members messages/reactions/etc... (happens when importing oneplus into pixel)
  2. An unknown bug causes a crash-loop in Signal (happens when importing pixel into oneplus)

Good news is, I believe I fixed (1), and even two other potential problems on the way. Thanks to your report, I could easily reproduce. If you still have the offending backups (without the extra message sent on the pixel) I'd love for you to test it. No need to actually restore on phone, just to see if the merge succeeds without errors. No worries if you can't. Obviously the bug in Signal is not mine to fix. So that leaves (2). Which is a complete mystery to me, I'm sort of hoping it's either also Signal's issue, or it has magically vanished with the other fixes I just pushed...

I swap between different phones semi-regularly so I have a normal backup merge flow.

I must say you are brave. The merging functionality is extremely complicated. It was already complicated when I first wrote it and as Signal's database changed over the years, the function has gotten more and more complicated to the point where I only barely grasp the full scope of it myself. Of course I don't know why you have this custom of swapping phones and merging backups, but I personally would not use this function on my main backup with all my important messages unless I had to. Having said that, I'm always happy to hear the function still works (and it does generally seem to work, when I manage to keep up with Signal updates).

Anyway, thank you so much again, I wouldn't have been able to fix that bug without you report. If you can test the new version that'd be great. And if I can think of what is happening with (2) I'll get back to you.

bepaald commented 1 year ago

In your crash-looping log I see a lot of

[6.21.3] [71   ] 2023-06-11 17:06:07.288 EDT E StoryDistributionListRe: Cannot find matching database record for My Story.
[6.21.3] [71   ] 2023-06-11 17:06:07.288 EDT E BaseJob: Encountered a fatal exception. Crash imminent.
[6.21.3] [71   ] 2023-06-11 17:06:07.288 EDT E BaseJob: org.thoughtcrime.securesms.storage.StoryDistributionListRecordProcessor$MyStoryDoesNotExistException

So i will look into stories (I've never used them myself, not sure if I am (or need to) do anything special with them during merge). Do you use stories at all?

crimsoncor commented 1 year ago

I have never used Stories, so I'm not really sure what is going on there.

I understand your confusion, because I'm still a bit perplexed myself. There is definitely a bug in Signal, so we all can agree on that.

I think there was only a single bug in the backup tools, but it was manifesting in two different ways. The bug was around importing new members into groups. I'm not fully sure why the order of the import changed the behavior, but I believe that what was causing Signal to crash lock was that missing user in the group. My assumption is that even in the case where signal-backuptools was not raising the foreign key error, it was still incorrectly reconciling the group information and was writing an invalid db record that was causing Signal to crash lock.

I still have all the older backup files so I should be able to test this theory, as well as your fixes, though it may take me a little bit of time because I'm going to be traveling for a bit.

bepaald commented 1 year ago

I think there was only a single bug in the backup tools, but it was manifesting in two different ways.

Well I hope you're right. Very kind to offer to test, I look forward to the results (but no hurry of course, do take your time).

crimsoncor commented 1 year ago

Sorry about the long wait. Finally got a chance to test this. with the last build, the database merge works in both directions without any issues or any crash on restore. So it seems like everything was sorted. thanks so much for the quick fix and digging into this!

bepaald commented 1 year ago

No worries, it must have been quite a hassle to run the merge twice and restore both just to test. Thanks for reporting back, glad it's solved!

MartB commented 5 months ago

I cant use the story tab after restoring or disable them with the same error message: MyStoryDoesNotExistException

05-13 16:06:08.840 20350 20419 E BaseJob : org.thoughtcrime.securesms.storage.StoryDistributionListRecordProcessor$MyStoryDoesNotExistException
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.storage.StoryDistributionListRecordProcessor.getMatching(StoryDistributionListRecordProcessor.java:76)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.storage.StoryDistributionListRecordProcessor.getMatching(StoryDistributionListRecordProcessor.java:24)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.storage.DefaultStorageRecordProcessor.process(DefaultStorageRecordProcessor.java:51)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.storage.StoryDistributionListRecordProcessor.process(StoryDistributionListRecordProcessor.java:24)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.jobs.StorageSyncJob.processKnownRecords(StorageSyncJob.java:414)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.jobs.StorageSyncJob.performSync(StorageSyncJob.java:293)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.jobs.StorageSyncJob.onRun(StorageSyncJob.java:194)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.jobs.BaseJob.run(BaseJob.java:31)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.jobmanager.JobRunner.run(JobRunner.java:86)
05-13 16:06:08.840 20350 20419 E BaseJob :  at org.thoughtcrime.securesms.jobmanager.JobRunner.run(JobRunner.java:49)
05-13 16:06:08.841 20350 20419 E JobRunner: [JOB::5bca633c-f6c0-4b67-aafa-cc1790ae024b][StorageSyncJob][1] Job failed with a fatal exception. Crash imminent. (Time Since Submission: 987 ms, Lifespan: 86400000 ms, Run Attempt: 1/3, Queue: StorageSyncingJobs)
05-13 16:06:08.841 20350 20688 D ConversationListDataSou: [size(), UnarchivedConversationListDataSource, OFF] 0 ms

Workaround: I posted one after linking a desktop app with no message history, seems to have cured it. I can now disable it.

bepaald commented 3 months ago

I cant use the story tab after restoring or disable them with the same error message: MyStoryDoesNotExistException

Just wanted to add for completeness that after another user's report with a similar problem (though upon cropping instead of merging (which call some of the same functions)), changes have been made to how stories, and specifically 'my story' is handled. I hope this issue is now fixed.