bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
790 stars 38 forks source link

Meaning of "[Warning]: Attachment data not found"? #175

Closed JanWaldhorn closed 9 months ago

JanWaldhorn commented 9 months ago

Thank you for this great tool. I wanted to reduce the size of my Signal backup (with this script) and encountered the following warnings like:

Dealing with table 'part'... 1598/4767 entries...Warning: attachment data not found (rowid: 1656, uniqueid: 1593119286466)

Click me for whole log `signalbackup-tools_win.exe "D:\signal-backup-shrink\backupFiles" -o OUTPUT.backup -op [passphrase] *** Starting log: *** signalbackup-tools (signalbackup-tools_win.exe) source version 20231226.102348 (Win) Opening from dir! Reading database... Reading HeaderFrame Reading DatabaseVersionFrame Reading SharedPreferenceFrame(s) Reading KeyValueFrame(s) Reading EndFrame Reading AvatarFrames: 24/24 Reading AttachmentFrames Reading StickerFrames Done! Database version: 211 Exporting backup to 'OUTPUT4.backup' Writing HeaderFrame... Writing DatabaseVersionFrame... Writing SqlStatementFrame(s)... Dealing with table 'avatar_picker'... 0/0 entries... Dealing with table 'recipient'... 542/542 entries...done Dealing with table 'thread'... 77/77 entries...done Dealing with table 'message'... 62647/62647 entries...done Dealing with table 'call'... 7/7 entries...done Dealing with table 'call_link'... 0/0 entries... Dealing with table 'cds'... 521/521 entries...done Dealing with table 'chat_colors'... 0/0 entries... Dealing with table 'distribution_list'... 1/1 entries...done Dealing with table 'distribution_list_member'... 0/0 entries... Dealing with table 'donation_receipt'... 1/1 entries...done Dealing with table 'drafts'... 0/0 entries... Dealing with table 'emoji_search'... 0/0 entries... Dealing with table 'groups'... 7/7 entries...done Dealing with table 'group_membership'... 47/47 entries...done Dealing with table 'group_receipts'... 5495/5495 entries...done Dealing with table 'identities'... 59/59 entries...done Dealing with table 'kyber_prekey'... 376/376 entries...done Dealing with table 'mention'... 21/21 entries...done Dealing with table 'msl_payload'... 1395/1395 entries...done Dealing with table 'msl_message'... 1604/1604 entries...done Dealing with table 'msl_recipient'... 1505/1505 entries...done Dealing with table 'notification_profile'... 0/0 entries... Dealing with table 'notification_profile_allowed_members'... 0/0 entries... Dealing with table 'notification_profile_schedule'... 0/0 entries... Dealing with table 'part'... 1598/4767 entries...Warning: attachment data not found (rowid: 1656, uniqueid: 1593119286466) Dealing with table 'part'... 1652/4767 entries...Warning: attachment data not found (rowid: 1742, uniqueid: 1597222558431) Dealing with table 'part'... 1706/4767 entries...Warning: attachment data not found (rowid: 1853, uniqueid: 1599588364074) Dealing with table 'part'... 1715/4767 entries...Warning: attachment data not found (rowid: 1865, uniqueid: 1599861721968) Dealing with table 'part'... 1727/4767 entries...Warning: attachment data not found (rowid: 1897, uniqueid: 1600850513701) Dealing with table 'part'... 1847/4767 entries...Warning: attachment data not found (rowid: 2052, uniqueid: 1607600432904) Dealing with table 'part'... 3777/4767 entries...Warning: attachment data not found (rowid: 4332, uniqueid: 1677400723291) Dealing with table 'part'... 3811/4767 entries...Warning: attachment data not found (rowid: 4374, uniqueid: 1678194129462) Dealing with table 'part'... 3967/4767 entries...Warning: attachment data not found (rowid: 4558, uniqueid: 1682155630854) Dealing with table 'part'... 4767/4767 entries...done Dealing with table 'payments'... 0/0 entries... Dealing with table 'pending_pni_signature_message'... 0/0 entries... Dealing with table 'pending_retry_receipts'... 0/0 entries... Dealing with table 'reaction'... 1770/1770 entries...done Dealing with table 'remapped_recipients'... 3/3 entries...done Dealing with table 'remapped_threads'... 0/0 entries... Dealing with table 'remote_megaphone'... 10/10 entries...done Dealing with table 'sender_key_shared'... 0/0 entries... Dealing with table 'sender_keys'... 0/0 entries... Dealing with table 'sticker'... 103/103 entries...done Dealing with table 'storage_key'... 4/4 entries...done Dealing with table 'story_sends'... 0/0 entries... Writing SharedPrefFrame(s)... Writing KeyValueFrame(s)... Writing Avatars... Writing EndFrame... Done! Wrote 1866842940 bytes.`

I then ran the command for repairing the backup (from the readme) and the warnings remain the same. Input and output file have the same SHA512 hash. Therefore I assume that all attachments are still present? Or are these entries in the database for deleted attachments? I have searched in the issues and in the source code, but have not found an explanation that makes sense to me.

I have just tried this again with the latest version (20240101.210703 (Win)) and a new backup, but the result remains the same.

Is there a way to find out more about the warnings using the specified rowid/uniqueid?

bepaald commented 9 months ago

Hi! Yes these warnings are nothing to worry about. As you noticed from the SHA hash, no attachments are removed or lost by this application (though they are, as the warning indicates, not 'all present').

Or are these entries in the database for deleted attachments?

I've often thought about removing the warning, because it causes some confusion, but so far I haven't because the truth is: I don't know why some attachments remain in the part table, even though their data is gone. In normal circumstances, deleting an attachment will also delete the entry in the database.

There are a few cases which are known to cause this, they are already checked for by this program and then the warning is suppressed. (See here and missingattachmentsexpected.cc)

Is there a way to find out more about the warnings using the specified rowid/uniqueid?

To get some info on the message the attachments belong to, you can run:

signalbackup-tools [input] [passphrase] --runsqlquery "SELECT DATETIME(ROUND(message.date_sent / 1000), 'unixepoch', 'localtime') AS 'Date', COALESCE(groups.title,recipient.system_joined_name, recipient.profile_joined_name, recipient.profile_given_name, recipient.e164) AS 'Conversation name', message.body AS 'Message body' FROM message LEFT JOIN thread ON thread._id = message.thread_id LEFT JOIN recipient ON recipient._id = thread.recipient_id LEFT JOIN groups ON groups.group_id = recipient.group_id WHERE message._id IN (SELECT part.mid FROM part WHERE part._id = [rowid] AND part.unique_id = [uniqueid])"

This should hopefully show the message body, the conversation it was in, and the date/time when the message was sent. This should hopefully be enough for you to find the message in Signal and assert that it has no attachments. Obviously, you need to replace the [rowid] and [uniqueid] at the end of the query with the reported values. The --runsqlquery option can be repeated, so you could run queries for all the missing attachments in one command if you want. (If you have sqlite installed you can of course also run the query in sqlite on the decrypted database)

If you have any insights after looking at the messages, I'd love to hear it.

Thanks!

JanWaldhorn commented 9 months ago

Thank you for the detailed answer. I have done this for all combinations and it is always a message quoting another (deleted) message/image/video. When I click on the quote, Signal reports that the file/message no longer exists.

So the warning is correct so far, but perhaps it could be supplemented with a possible explanation? For example "Probably a quote from a deleted message"

bepaald commented 9 months ago

Thanks for reporting back. The attachment-in-a-quote-to-a-missing-message case is supposed to be recognized as one of the known reasons for missing attachment data, in which case the warning should not show. Apparently that's not always working in all cases. I'll try to reproduce the same situation here to try and fix this.

Maybe you would be willing to test an update if I come up with something? It'll be a few days probably.

Thanks!

JanWaldhorn commented 9 months ago

Ah interesting, I didn't look at the linked source code. Anyway, I would be happy to help with testing. Normally I use Win 10, but I also have a machine here running Linux Mint in case that helps.

bepaald commented 9 months ago

Thank you for testing. I've tried to refine the missingAttachmentExpected() function, hopefully it will now also suppress the warning in your cases (and not show any new warnings). If you have some time, maybe you could let me know?

Thanks!

JanWaldhorn commented 9 months ago

Oh well, that was quick. I just tested it with the latest release and all the old warnings are gone, but there is a new one. This one seems to be a false positive: It is again a message with a quote to an audio file, but it still exists and the linking works. So I don't see where the warning could come from. The whole thing happens with different backups, it is also an old message (end of 2019). I haven't had time to look at the changes in the source code and I'm not so fluent in c++ either. So I don't know what change could have caused this. Is there any other information that could help?

bepaald commented 9 months ago

Thanks for testing. I may have been a little optimistic in the last update to the missing-attachment code concerning quoted audio messages specifically. I've now reverted that part of the code, so hopefully it works now. Perhaps you can test one more time?

Getting this precise is not easy, there are cases where it's impossible to predict if an attachment is 'supposed' to be missing. For example, from my testing earlier today, I've found that whether or not the entry in the part remains depends on the order in which a quote is received and the quoted message was deleted. However, the information on the order of these events does not exist in the backup. So there will always be either false positives or false negatives.

But since I only check when the data is actually confirmed missing, I should opt for the function that gives more false positives so the warning is not unnecessarily shown. I hope that it works now. If it still doesn't (or gives yet another new warning), I might have some sql queries for you to run to get some more information, if you're willing of course.

Thanks!

JanWaldhorn commented 9 months ago

All warnings are gone. Awesome!

bepaald commented 9 months ago

Excellent. Thanks for your help!

bepaald commented 9 months ago

I think this can be closed, but do let me know if there's anything else. Thanks again!