IdealChain / signal-media-exporter

A script to export media files from Signal Desktop.
GNU General Public License v3.0
30 stars 10 forks source link

Export conversations #4

Open nahoj opened 4 years ago

nahoj commented 4 years ago

Hi. I think I would be very useful if the script was able to export the text contents of conversations in a readable format in addition to the media. It could be HTML or the "SMS Backup & Restore" XML format like signal-back was supposed to do.

Would you be interested in developing this? How difficult to you think it would be? It looks to me like you did the tricky work of opening the database and it would just be a matter of exporting the available data in the desired format. If that is the case, I might be interested in contributing it.

IdealChain commented 4 years ago

It should not be too hard to export the messages too. A bit of care needs to be taken to maybe not always export all messages on every run, but only append the new ones instead. If i recall correctly, I did not do it at the time because the messages were stored in LevelDB, not yet in the SQLite database like it is nowadays.

Does the XML format of "SMS Backup & Restore" have any advantages, can it be read by some existing tools? Otherwise HTML with a basic stylesheet and media files linked sounds most useful to me for later reading/searching, but it depends on your intentions.

nahoj commented 4 years ago

The SMS Backup & Restore app seems pretty popular so there are several tools to browse its archives, though they don't seem highly sophisticated (I haven't tried any). But I think some simple HTML would be best for my simple needs (reading/searching as you say). I wouldn't trust myself to make good-looking HTML from scratch but it shouldn't be too difficult to copy an existing style for this.

Would you do it or would you like me to? Or split the work?

What would be most useful to me would be to have one folder per conversation with an index.html and all files and media (possibly duplicated across conversations but that would be a very small number in my case), with an option to export only some conversations and not export the media for some conversations. Media and files would appear in the HTML when possible, for the rest it would be links.

This would be a bit different from your current categorization by contact/number but I wouldn't mind contact subfolders inside conversation folders, if that was acceptable to you.

IdealChain commented 4 years ago

My first idea was to leave the media dir as-is and add a conversations dir in a second pass with links back to used media files. That would have been the simplest extension - but I understand that you want to have every conversation as a self-contained directory, which makes sense.

What do you think of using python string templates in the config to keep the path customizable?

The setting for the current mode would be "mediaPath": "$sender/signal_$timestamp.$ext" and media files in conversation dirs would be "mediaPath": "$conversation/signal_$timestamp.$ext" or "mediaPath": "$conversation/$sender/signal_$timestamp.$ext"

The contact names could be taken from the database too instead of requiring configuration. Then the "map" setting would not be needed anymore and there could be include/exclude settings for media as well as conversations.

If you have time and feel like it, please, go ahead! Otherwise I might also find some time for this, but not sure when.

I can recommend sqlitebrowser with sqlcipher support for easy database exploration. If you prefer, the XML format would also be okay, we can always add support for other formats later.

nahoj commented 4 years ago

The string-templates idea sounds nice but I think it adds a bit of complexity and might be unnecessary. It should be simple enough to just have the 2 modes we want, at least as a first step.

I agree HTML might be best, maybe with some tagging for future reprocessing if desired.

I'll have a go at it. I'll try to extract contacts from the database.

nahoj commented 4 years ago

Would switching to GPL3 be something you'd consider to be able to take stuff from Signal Desktop (I'm thinking stylesheets)? Could do without of course.

LukeCageCodes commented 2 years ago

Hello,

I've got a version of signal-media-exporter with an extra option to export media. However, it just dumps all the tables, into a configurable directory with export date. So, it's not incremental, but it works for me. :)

Would that be useful to contribute as a first step?

I'm not sure how to do the linking back to the media files. Basically, the tables have the message key, while the exporter generates the filename. The simplest idea would be to record the message key alongside the filename, maybe in the filename or perhaps in a separate log. Any thoughts on that?