bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
790 stars 38 forks source link

Importing from Android backup to Signal desktop #116

Closed magnus-ISU closed 1 year ago

magnus-ISU commented 1 year ago

So I just made a silly PR to signal desktop, and then like any rational human being, I rm -rf ~/.config/Signaled right after.

I at first thought I might be able to recover the files, but it seems no.

It would be a really nice feature to be able to use the android backups we can generate and put them into signal desktop. I would be partial to trying to do this by forking Signal Desktop also, but as you have knowledge of the database format etc you probably would know better than I do how feasible this would be to add to this project also.

Since an android backup (in theory) contains the full history and also can be merged with desktop data using these tools, I would not be opposed to completely overwriting the desktop database also.

bepaald commented 1 year ago

So I just made a silly PR to signal desktop, and then like any rational human being, I rm -rf ~/.config/Signaled right after.

Ouch!

I'm sorry for your loss. I usually always try to help people recover their messages, but honestly, I don't think I'm going to be able to help with this very much. Doing an import from Android backup to desktop database would be a huge undertaking (and somewhat outside of the scope of this project). I'm already stretched too thin as it is and almost drowning in the import from desktop to Android backup. So with great pain, I'm afraid I'm going to close this issue, sorry.

If you decide to have a go yourself, you could of course use this tool to extract the backup and try to insert some messages into a desktop database (using sqlcipher). If you're going to try this (good luck to you!) and have any questions, please feel free to contact me. My knowledge of the desktop db is limited (and maybe I haven't looked at it long enough to appreciate it, but I think it's a big mess), but I'm quite familiar with the Android database format.

Also, if you just want you message history on your computer, away from your phone, you might consider using this program's --exporthtml option for that. Not ideal, not the same as having it all in Signal Desktop, but maybe better than nothing.

magnus-ISU commented 1 year ago

Thanks!

I have decided to implement this feature myself then as no transfering of chat history is honestly one of the biggest problems in Signal by my estimation, and signalbackup-tools solves desktop to android so android to desktop is the other direction it is needed. I forked Signal-desktop so I don't have to actually implement inserting into the database, I can just use their methods for it.

I am not fully understanding the android database specification. You mentioned that you have a lot of knowledege of that, so I would appreciate your help. For example

{"level":30,"time":"2023-05-11T19:08:28.148Z","msg":"{\"body\":\"Dang that is speedy. How often do you do that? How much would you say you retain? Ha 72 hours in a day would be... an accomplishment? Torturous? Something that needs to be done?\",\"quote_body\":\"I don't know if I did any particularly Magnusish things today but I am watching youtube right now at 4x speed which might be one\",\"from_recipient_id\":232,\"to_recipient_id\":232}"}

Here the from and to ids are the same, but this is a message from my friend (232) to me (1). It seems that to designates the conversation, regardless of who is sending, and from specifies if it is from me or from them? EDIT: there are also examples where from=232, to=1 and from=1,to=82 so I don't think either specifies anything concretely. So there I guess thread_id is needed along with the thread table?

I intend to get single chats working first as I'm not actually in any group conversations. Then maybe work on groups. I do want to have attachments work though.

It seems like the relevant fields are date_received, date_sent, from_recipient_id, to_recipient_id, body, quote_id, quote_body, link_previews. What do time and thread_id do? Are there other fields which are necessary?

Here's another full example record in the sqlite android database.

{"level":30,"time":"2023-05-11T18:48:09.102Z","msg":"{\"_id\":9,\"date_sent\":1641013421219,\"date_received\":1641077615698,\"date_server\":1641013422363,\"thread_id\":4,\"from_recipient_id\":6,\"from_device_id\":null,\"to_recipient_id\":6,\"type\":10485780,\"body\":\"You know what to do\",\"read\":1,\"ct_l\":\"\",\"exp\":null,\"m_type\":132,\"m_size\":null,\"st\":1,\"tr_id\":null,\"subscription_id\":-1,\"receipt_timestamp\":-1,\"delivery_receipt_count\":0,\"read_receipt_count\":0,\"viewed_receipt_count\":0,\"mismatched_identities\":null,\"network_failures\":null,\"expires_in\":0,\"expire_started\":0,\"notified\":0,\"quote_id\":0,\"quote_author\":null,\"quote_body\":null,\"quote_missing\":0,\"quote_mentions\":null,\"quote_type\":0,\"shared_contacts\":null,\"unidentified\":1,\"link_previews\":null,\"view_once\":0,\"reactions_unread\":0,\"reactions_last_seen\":1641077976869,\"remote_deleted\":0,\"mentions_self\":0,\"notified_timestamp\":1641077617159,\"server_guid\":\"[REDACTED]30f\",\"message_ranges\":null,\"story_type\":0,\"parent_story_id\":0,\"export_state\":null,\"exported\":0,\"scheduled_date\":-1,\"latest_revision_id\":null,\"original_message_id\":null,\"revision_number\":0}"}

This message actually contains an attachment. However none of the fields seem to reference the attachment (Attachment_6_1641077615707.bin). Where is that information?

bepaald commented 1 year ago

To be honest, I'm not really recognizing these examples as entries from the Android backup, these look like json objects, where are you getting these? Only the part inside the msg subobject has the fields from a normal message (but missing many of them).

Here the from and to ids are the same, but this is a message from my friend (232) to me (1). It seems that to designates the conversation, regardless of who is sending, and from specifies if it is from me or from them? EDIT: there are also examples where from=232, to=1 and from=1,to=82 so I don't think either specifies anything concretely. So there I guess thread_id is needed along with the thread table?

So, up until just a week or two ago, the message table did not have from_recipient_id and to_recipient_id, it just had a single recipient_id. The value of this field would correspond to the _id of a specific recipient in the recipient table. The rules for this are as follows:

For 1-on1 conversations: recipient_id was the chat partner's _id. Whether a message was to that person or from that person is only determined by the message.type (which seems to be missing from the examples you posted, message types are detailed here).

For group messages: recipient_id on incoming messages was the group member who originated the message. On outgoing messages it is the _id of the group (the group has its own entry in the recipient table, other relevant tables linked to this via recipient.group_id are the groups and group_membership tables).

After the recent changes, recipient_id was replaced with to_ and from_. So now, on newly created messages, one of the two fields is the same as before (chat partner or group), the other is self-id, depending on whether the message was incoming or outgoing. For messages that already existed in the database before this change, only outgoing messages the from_ field was updated to self-id (existing incoming messages were untouched).

It's seems pretty complicated trying to explain it. Here a picture:

--- 1-on-1 messages
                     incoming                                 outgoing
from_recipient_id    chatpartner                              self
to_recipient_id      self (chatpartner on existing msgs)      chatpartner

--- group messages
                     incoming                                 outgoing
from_recipient_id    originator                               self
to_recipient_id      self (originator on existing msgs)       group's _id

It seems like the relevant fields are date_received, date_sent, from_recipient_id, to_recipient_id, body, quote_id, quote_body, link_previews. What do time and thread_id do? Are there other fields which are necessary?

thread_id specifies the entry in the thread table that the message belongs to (you can have multiple message from id '3', but in different (group)threads). So, the proper way to match a message to a conversation would be to check its thread_id (match it to thread._id, then check that thread's recipient_id (match it to recipient._id) and see who the conversation partner is (a person or a group).

And you will certainly need the type. The type not only shows whether a message was sent or received but a ton of other message-types (profile changes and other status messages). Some of these types indicate the message body is a base64 encoded string that represents some protocol buffer, which you will most probably want to skip at least initially (though I think these are more likely to appear in group-threads which you don't have).

There is no time field in the message database, so again, I don't know where your examples are coming from. You can see the message table definition here.

edit I see now in your second example, it looks like a full message entry is in the msg sub-object. The stuff outside of that (level and time) I don't know where that comes from.

When I query a recent message from my message table it looks like this:

sqlite> .mode line
sqlite> SELECT * FROM message WHERE _id = 80908;
                   _id = 80908
             date_sent = 1683147208622
         date_received = 1683147208623
           date_server = -1
             thread_id = 17
     from_recipient_id = 71
        from_device_id = 1
       to_recipient_id = 32
                  type = 10485783
                  body = Ok
                  read = 1
                  ct_l = 
                   exp = 
                m_type = 128
                m_size = 
                    st = 
                 tr_id = 
       subscription_id = -1
     receipt_timestamp = 1683147207175
delivery_receipt_count = 1
    read_receipt_count = 0
  viewed_receipt_count = 0
 mismatched_identities = 
      network_failures = 
            expires_in = 0
        expire_started = 0
              notified = 0
              quote_id = 0
          quote_author = 0
            quote_body = 
         quote_missing = 0
        quote_mentions = 
            quote_type = 0
       shared_contacts = 
          unidentified = 1
         link_previews = 
             view_once = 0
      reactions_unread = 0
   reactions_last_seen = -1
        remote_deleted = 0
         mentions_self = 0
    notified_timestamp = 0
           server_guid = 
        message_ranges = 
            story_type = 0
       parent_story_id = 0
          export_state = 
              exported = 0
        scheduled_date = -1
    latest_revision_id = 
   original_message_id = 
       revision_number = 0

This message actually contains an attachment. However none of the fields seem to reference the attachment (Attachment_6_1641077615707.bin). Where is that information?

So, attachments are linked to messages through the part table. To check whether a message has one or more attachments, take its _id, and link it to the part tables mid field (in your example case: SELECT * FROM part WHERE mid = 9). This query can return multiple results, since a single message can contain multiple attachments. The important fields from the part table are _id and unique_id (funnily enough, the _id is unique, but the unique_id not necessarily so). These two numbers should correspond to the numbers in the filename this program writes (Attachment_[part._id]_[part.unique_id].bin).

Well, that was a long post. I hope I explained it somewhat well. There's a ton more fun and interesting things going on the Android database (depending on how precise you'll want to make the transfer, you might also want to take a look at the reaction and mention tables).

Have fun! If you need more clarification, or you have any more questions, please let me know, I'll do my best.

palmerj commented 1 year ago

Hi @magnus-ISU did you manage to implement this feature yourself?

magicdoublem commented 1 year ago

I have decided to implement this feature myself then as no transfering of chat history is honestly one of the biggest problems in Signal by my estimation, and signalbackup-tools solves desktop to android so android to desktop is the other direction it is needed. I forked Signal-desktop so I don't have to actually implement inserting into the database, I can just use their methods for it.

This would be a great thing to have, did you get it working?