bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
755 stars 36 forks source link

Reassociating attachement/media #207

Closed lp35 closed 3 months ago

lp35 commented 4 months ago

Hi!

First I would like to warmly thank you for all the hard work you have done for this project.

As many Signal users, I have trouble to find a proper backup protocol. Signal backups embed all medias, and is not incremental, resulting of huge backup files that need to be synched over the air each time.

In order to solve this issue:

1.) Export all medias using the "Storage" section of Signal app 2.) Delete media from signal internal storage, resulting of making signal backup very small (only text in it) 3.) synch media through another way (e.g syncthing)

However, I would like to be able to reassociate conversation with media if needed. My question is:

Would it be possible, given a Signal backup where media have been removed through the signal app, to scan a folder of media and re-associate them in the conversation?

E.g: signalbackup-tool MYBACKUP --reassociate-images FOLDER_WHERE_MEDIA_ARE_STORED

Maybe there is a sort of "file-hash" definitely stored in signal backups, that would to reassociate with an existing media.

Thanks for your time!

bepaald commented 4 months ago

Hi!

Interesting problem.

Would it be possible, given a Signal backup where media have been removed through the signal app, to scan a folder of media and re-associate them in the conversation? Maybe there is a sort of "file-hash" definitely stored in signal backups, that would to reassociate with an existing media.

I do not see a way to do this. As far as I know, after deleting an attachment, nothing in the database remains to indicate it was ever there.

Also please note:

2.) Delete media from signal internal storage, resulting of making signal backup very small (only text in it)

When deleting 'media' through signal, the entire message the attachment belongs to is deleted, including any text component. If you're still able to create full backups (if you still have the free space for it), I'd sooner recommend exporting a backup and using this tools --deleteattachments option (see here), or alternatively --replaceattachments (here). Both of these options will keep the text-content of messages intact. Also, there is a handy script that uses this tool and --replaceattachments to replace attachments in a backup with shrunk versions: https://github.com/cycneuramus/signal-backup-shrink (I've never tested it, but last I heard it worked). Be sure to save an original backup when trying these functions, they seem to have worked fine so far, but I don't know how often they are used and with Signal's changing database scheme things could always be broken.

With these methods, especially --replaceattachments, things are less destructive, since the actual messages and the associated attachment entries in the database will all remain. This makes it at least theoretically possible to undo the replacement (at least for image-attachments). The hard part when undoing will be matching the replaced images with the correct originals (which again, I wouldn't know how to do, but it's at least possible even if it has to be done manually).

Thanks!

lp35 commented 4 months ago

Thanks for your very complete answer. I guess it is not possible to store extra metadata/entries in the signal database without breaking it? For example, adding a DB entry called "signalbackup-imagehash"?

Otherwise I thought about a (nasty) hack:

  1. Extract image
  2. reshrinking to a very, very small image
  3. add a metadata to the jpeg with 2 custom exifs: name of the exported file + md5 hash of the image.
  4. Reinsert image in the backup

What do you think about that?

Cheer!

bepaald commented 4 months ago

I guess it is not possible to store extra metadata/entries in the signal database without breaking it? For example, adding a DB entry called "signalbackup-imagehash"?

I think this is probably possible, if added in a separate table (on updates, tables are occasionally rebuilt, which would cause any added columns to be dropped again). But, while right now during backup creation Signal will just export all database tables, excluding a specific list (blacklisted), any future update might change this behavior to only export a specific list (whitelisted) in which case all your extra data is lost again.

All in all this sounds like a bad idea to me, certainly a much nastier hack than your second option.


Otherwise I thought about a (nasty) hack:

  1. Extract image
  2. reshrinking to a very, very small image
  3. add a metadata to the jpeg with 2 custom exifs: name of the exported file + md5 hash of the image.
  4. Reinsert image in the backup

What do you think about that?

That is much cleaner I think, it produces a completely valid, normal backup with modifications that should last forever. (though Signal itself normally strips exif data when attaching an image, I don't think it touches the files after they are in the database). Also, this should be easily scriptable, pretty much like the script I linked to in my previous message but with the extra step of adding the exif data to the new small files. I must admit I know next to nothing about exif data, I assume it allows for custom tags with custom values? (EDIT: this should work for that purpose: https://exiv2.org/manpage.html#set_exif_comment)

bepaald commented 3 months ago

I think we're done here? Let me know if you need any more help from me. Thanks!