bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
755 stars 36 forks source link

XML export is incomplete #178

Closed DottoreG closed 8 months ago

DottoreG commented 8 months ago

Looking at the XML export I observed that at least the avatars (image of person/group) are missing in the XML output. I would like to see it added. (As a side note: Even with --dumpmedia I can't find the avatars.)

Since I'm not familiar with Signal's data structure I cannot oversee what else is also missing. But on further investigation I see that e.g. reactions and stickers are missing.

bepaald commented 8 months ago

Hi!

The XML output is supposed to implement the de facto standard SMS backup xml format, as Signal itself used to do. This standard does not include avatars (and many, many other things that exist in Signal).

I should probably clarify this is the README, I thought I had, but reading that section that is clearly not the case. In fact that section is very badly written and gives hardly any information. I will try to fix this sometime next week when I have time.

The XML output can be used to import messages into any program that supports the format (SMS Backup & Restore was the original I think). For example, people have successfully used this option to import their Signal history into iOS' messages app, and then from there into Signal iOS (which does not support backups (but also no longer supports SMS, so this does not work anymore)).

I do not plan to add avatars into the XML format, because it's not in the schema (see here). And I don't know how 3rd party programs would react to unknown data. Also, Signal's internal data structure is large and complicated and even just defining a new schema to include all of it would be a big undertaking (let alone implementing it).

Since I'm not familiar with Signal's data structure I cannot oversee if more items are missing in XML.

There are, tons of things 'missing'. What exactly is it you are trying to do? If you want a full decode of the backup file, just pass a directory to --output. If you need the database in some human-readable format (i.e. not sqlite) you might want to try --exportcsv? (note by the way that the database does not include the avatars either)

As a side note: Even with --dumpmedia I can't find the avatars.

Maybe this function would have been better named 'dumpattachments'... On the other hand, Signal itself uses 'media' for message attachments (and these also do not include avatars). To dump the avatars, there is --dumpavatars.

Thanks!

DottoreG commented 8 months ago

Thank you very much for your explanation. This explains the background behind the XML file. I was not aware of this after reading the documentation. And I can completely understand that you don't plan to extend the XML export.

Perhaps I can pick up your tip to use the CSV export. (Or I could directly access the database.) But I'm missing the necessary background: I can't find a documentation about the relationship between the tables. And I can't determine the relationship to avatar files, attachment files etc. Any hint is greatly appreciated!

To the background: I'm experimenting to convert the Signal messages to TiddlyWiki (https://tiddlywiki.com/).

bepaald commented 8 months ago

Specifically for avatars and attachments, I've recently explained matching them (tried to explain, at least) to entries in the SQL table here: https://github.com/bepaald/signalbackup-tools/issues/161#issuecomment-1843116431

Explaining all links in the database would be time consuming. Generally speaking, many tables will have some field named recipient_id (or from_recipient_id, or to_recipient_id) which will correspond to the _id field of the recipient table (which holds one entry for each Signal contact (be it a person or a group)). Similarly, some tables have a thread_id or message_id linking entries to the _id field of the thread and message tables respectively.

One example: I have sent a message at time 1700686557386 (that's a unix timestamp in milliseconds, as they are in the database). To find out the name of the group I sent this message to, we query the to_recipient_id-field from the message table: SELECT recipient_id FROM message WHERE date_sent = 1700686557386. This returns 27. Then we find the group_id: SELECT group_id FROM recipient WHERE _id = 27. This returns a string like __signal_group__v2__!1080952d4e738178ae051f2706c[...]. Then we find the group's name with SELECT title FROM groups WHERE group_id = '__signal_group__v2__!10[...]. If you are clever, all these queries can be combined into one, giving you the answer in one query. To get the members of this group, you would query SELECT recipient_id FROM group_membership WHERE group_id = '[...]', which will give you a bunch of recipient_ids again. I hope that gives some general idea.

Let me know if you have any more specific questions.

bepaald commented 8 months ago

I've updated the README to better explain what the XML export is supposed to be. With that, I think this issue can be closed. But feel free to still ask for specific help on the database if you need it. Thanks!