carderne / signal-export

Export your Signal chats to markdown files with attachments
Other
481 stars 50 forks source link

Feature request for adding export to PDF + a command to convert to PDF for interested people #21

Closed LiroyvH closed 1 year ago

LiroyvH commented 3 years ago

Hi,

Thanks for the great work in reviving the SD-backups! :) The HTML files are great, but I personally don't like the pagination as that makes it a bit annoying to search/read/browse (as it stops scrolling over and over). On top of that: I prefer PDF over HTML. (Despite media not playing in PDF, haha!) It would be nice if something like this could be added as a flag, so that for example when you run "./sigexport.py -pdf outputdir" that it will generate PDF's instead of or in addition to (only) HTML files. :)

However, as such a function is not available right now - I've written a rather dirty work-around for interested people. It works reasonably well, just shows some "PREV, NEXT" stuff from the pagination stuff still in there. It does include all pictures and previews of gifs and such. (Obviously it cannot play video in PDF.) I haven't yet done anything to fix that PREV/NEXT stuff still showing up (requires changes to the html file) - was already happy with this. Will fix at a later point. So here we go:

What I did was: (-edit- I forked this project and (dirty) made all necessary changes in case you're interested: link to fork) 1.) revert to the old style.css that has no pagination (click), simply because otherwise due to the pagination it will NOT export all messages to PDF; but only the first page. So we need to use the older version so that all conversations load at once. 2.) Installed "wkhtmltopdf" (available in HomeBrew for Mac ("brew install wkhtmltopdf") and I believe in Ubuntu's apt-repositories as well ("sudo apt-get install wkhtmltopdf")) 3.) exported the HTML files with command "wkhtmltopdf --enable-local-file-access index.html Output.pdf"

OPTIONAL step, skip this part go to rest of the steps (skip to "automatic conversion") Now this causes a problem, wkhtmltopdf doesn't appear to be able to render the emoji's. This can be solved by adding this to the .html files:

<style>
img.emoji {
   height: 1em;
   width: 1em;
   margin: 0 .05em 0 .1em;
   vertical-align: -0.1em;
}
</style>
<script src="https://twemoji.maxcdn.com/2/twemoji.min.js?11.2"></script>
<script>window.onload = function () { twemoji.parse(document.body);}</script>

You can edit signalexport.py and go to line 300 ("\<body>") and add under that:

                "<style>"
                "img.emoji {"
                "height: 1em;"
                "width: 1em;"
                "margin: 0 .05em 0 .1em;"
                "vertical-align: -0.1em;"
                "}"
                "</style>"
                "<script src='https://twemoji.maxcdn.com/2/twemoji.min.js?11.2'></script>"
                "<script>window.onload = function () { twemoji.parse(document.body);}</script>"

You will have to run sigexport.py again afterwards to ensure it re-generates all .html files with these lines.

    AUTOMATIC CONVERSION

Now, converting all your conversations to PDF will be rather cumbersome to do manually, as you'd have to run that command one by one by one in each directory. Let's fix that. If you want to generate PDF's for all your exported signal conversations, you can run this one-liner in your export directory (where all the folders with your contacts names are, so NOT inside a conversation directory). This command goes on one single line (and it should when you copy/paste it :)): mkdir -p pdf && find . -maxdepth 2 -name '*.html' -exec sh -c 'for f; do wkhtmltopdf --enable-local-file-access "$f" "./pdf/$(basename "$(dirname "$f")").pdf"; done' _ {} +

This: 1.) Generates a folder "pdf" within your export directory if it doesn't exist yet 2.) Recursively finds all "index.html" files in the subdirectories (so your conversations) 3.) Tells wkhtmltopdf to convert all index.html files to PDF files and places the output in the "pdf" folder, the filename is "Contact.pdf". So if your contact is GunnarGunnerson, it will name the pdf "GunnarGunnerson.pdf".

Again, this only works when using the old style.css that does NOT use pagination. If you do have the one using pagination, it seems to output only the first x amount of messages. One problem is that it doesn't seem to be able to parse all the emoji's in existence. So sometimes emoji's may show a weird mark.

This has been tested on MacOS Big Sur. Commands should be exactly the same.

  Thank you for your consideration and enjoy for those who find this useful. :) - Liroy

carderne commented 3 years ago

Hi @liroyvh, thanks for the work you put into this.

Pagination

The reason I added the pagination was that on very large chats with lots of media, Chrome/Firefox etc would become extremely laggy. I set it to paginate after 100 messages, but I've now made the customizable (see commit 5cdd26515da5ffc4540028709250d2b4ac6ccfb9) so you can run ./sigexport --paginate 1000 for 1000 or ./sigexport --p 0 to disable pagination altogether (although for now the buttons will still appear).

PDF

I tried calling some different scripts from within Python but couldn't find a decent multi-platform solution and lots of edge cases... Will leave your issue up for now until I get time to look into it and see if worthwhile incorporating what you've done into the actual script...

carderne commented 2 years ago

Hi @LiroyvH

Would be great to get the changes from your fork integrated, if you think there are any that could be upstreamed?

The PDF stuff I assume should work with the --p=0 option (see above) that disables pagination? Might be worthwhile getting the HTML->PDF instructions added, if they will work.

It seems you've also made some macOS installation improvements, do you think it would be worth trying to integrate these?

carderne commented 2 years ago

You're obviously free to ignore this, but in general I think it is considered bad manners to "de-fork" your fork of a repo on GitHub. You'll notice that this repo is still "forked" from the upstream, even though at this point it shares maybe 5 lines of code.

LiroyvH commented 2 years ago

Hi @LiroyvH

Would be great to get the changes from your fork integrated, if you think there are any that could be upstreamed?

Not sure! The last thing I saw was that you thought it would be incompatible in a cross-platform manner and would look in to it 1+ year ago haha. And admittedly I've focused most on MacOS because of the ecosystem issue pretty much caused by Signal's developers refusing to add any export option whatsoever to Signal iOS and encouraging full data loss; so we're kinda forced down this road. Hence my focus on MacOS. Whilst I do use Linux, though not really Windows, I haven't tested any of it on other platforms. However, having said that, it should be compatible with at the very least Linux. They're opensource implementations, to my knowledge i've used nothing proprietary to achieve the desired goal for PDF generations whilst retaining emoji's and should be packaged for Linux as well. Again no clue on Windows.

I think the largest hurdle was finding a package that would be able to print emoji's to PDF from the get go, but I did find a library that serves that purpose and now it works great; it generates the HTML + Attachments as the original did (and keeps them) and then on top of that generates the PDF's with pictures and emoji's and outputs them to a PDF dir.

The PDF stuff I assume should work with the --p=0 option (see above) that disables pagination? Might be worthwhile getting the HTML->PDF instructions added, if they will work.

You mean it should work with the updated style.css? I recall I've undone some of the style.css stuff to undo pagination. So you think the new style.css could be updated and then try an export with --p=0 flag? I could try that sometime.

It seems you've also made some macOS installation improvements, do you think it would be worth trying to integrate these?

Sure, but it's a bit of a shitfest of dirty patches due to: 1.) pysqlcipher3 is absolutely dead and regularly throws hissyfits when attempting to deploy, but it's the only way 2.) Apple Silicon requires a different method of running this, because otherwise you get in trouble with point 1 as it will absolutely refuse to compile in any way on aarch64 even when bribed with sacrificing your first born.

But yes, I have patched it all to actually compile and run properly and made a really low-effort bash script (or rather: chained commands in a file, lol) to make it a one-liner for MacOS-users to run it post HomeBrew-deployment. (Unfortunately two-lines for Apple Silicon as we first have to switch the architecture to emulate x86_64). However, I will readily admit it isn't very cleanly done and the instructions I wrote are very much MacOS-centered and a wee bit elaborate due to the differences between the two archtypes... But... It works, and it works well.

I'd say ideally it can be made to crap out those PDF's with pictures and emoji's from the get go, but it does need to export attachments separately as you obviously cannot (reasonably) print video's or excel sheets to PDF.

LiroyvH commented 2 years ago

@carderne Oh, I see you had left another comment. Yeah, that was because I misinterpreted something in GH's (enterprise) documentation. Ultimately, it turned out it was completely unnecessary. But here we are. As you can see by my profile, I don't use GH a whole lot so mistakes are quite easily made.