bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
755 stars 36 forks source link

--append replaces the first few HTML files with new ones when --split is used #193

Closed guicaz closed 6 months ago

guicaz commented 6 months ago

Today I tried to take a newer backup of my signal chats and wanted to append what I already exported to today's date. After the export was successful, the index.html file was properly updated, but when I looked into the folder for one conversation the number of files didn't change. I noticed after a bit that it overwrote the first few html files with new ones. I haven't tested without the --split option if it appends properly. Is there an option I missed or is this a bug? What I expected to happen is that it reads what is the last .html number and continues from that point on.

This is the command I used:

./signalbackup-tools ../signal-2024-03-08-00-22-45.backup -p <passphrase> --exporthtml ../SignalExportHTML/ --limittodates 1693458000000,1704063599000 --split --append
hugogithubs commented 6 months ago

Hello, Quick side note / remark: I also noticed in the past, that, if user IDs changed, it is not possible to append correctly and new threads with new IDs where created in addition to the old (old ones will still be retained, but will of course no longer be linked in the index file). Since the program has no knowledge about changed IDs, my issue is probably not solveable.

bepaald commented 6 months ago

@guicaz

This is the expected behavior. I understand from the name of the option what you expected it to do, but that is virtually impossible. Not only would it require this tool to parse HTML (which is difficult to say the least), it would also cause many many other cases of unexpected behavior when various different options and date/thread limits are used with subsequent exports.

The reason --append exists and is useful, is because the time consuming part of the export is decrypting and writing the attachment data (often many gigabytes of them). That is what the --append option is for: if an attachment needs to be written, but it already exists (by name and path), it is skipped (both decryption and writing). All HTML is always rewritten, because it is much easier, and it takes virtually no time/resources at all. I see no great benefit to literally appending to the existing HTML. I believe this is correctly stated in both the README and the --help output. Though I admit the option is poorly named. I have often considered renaming it, but have not come up with anything much better (I'm open to suggestions).

Today I tried to take a newer backup of my signal chats and wanted to append what I already exported to today's date.

This is simple to do, just don't specify that you only want today's date: specify the whole range you want to exist in the export (not just the new part) and use --append. If you have an export of date1-date3 and later you want date4 and date5 as well, you run with --limittodates date1-date5. (Or, if you simply want everything, just don't specify any dates at all.)

By the way, I think --split also doesn't do what you think it does. If you have a conversation with 102 messages, and you export with --split 100, you do not get a second page with just 2 messages. Instead it is determined that 2 pages are necessary, and the available messages are then split across both equally: you get two pages with 51 messages each. If you later export a new backup where the conversation has grown to 190 messages, both page_1 and page_2 obviously need to be rewritten (each will contain 95 messages then). A change to this behavior can be easily implemented if needed, but I don't think most users would prefer this.

Please let me know if anything is not clear or I can help you with anything else.

@hugogithubs

You are correct, I do not think this is solvable. But also, even if the program somehow knew that "Bob (_id1)" is actually the same person as "Bob (_id2)", it would still never decide that the user must not need "Bob (_id1)" anymore and just delete that entire directory and its contents. I have 'foreign' files in my export directories quite often and would be (and have been, when I used --overwrite by mistake) incredibly annoyed if the program would suddenly delete my files because they were not part of the (latest) export. Now that I think about it, I could probably add a --deleteforeignfiles option to do this if you'd really want it. I don't think that would be too difficult to implement.

guicaz commented 6 months ago

@bepaald

You are right, I did not understand exactly how these options worked. Your message explains a lot. So if I get this right, --append only skips exporting medias that already exists, and appends new ones, right? I don’t know of a better name, maybe change or add to the description that it "writes newer medias without overwriting previous ones". But it is probably my fault for not understanding right.

I also indeed didn’t understand how --split worked, but now I do, thank you for the explaination.

Thanks for the quick reply, and software.

edit: I see it now about --append in the README. It’s my fault for reading/skimming over too quickly. Apologies.