eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.64k stars 125 forks source link

Disable CSS `@page` directives when converting to PDF #2400

Open madduck opened 11 months ago

madduck commented 11 months ago

HTML&CSS files can use @page directives to instruct the browser to insert breaks between physical pages, place PDF pages on specific formats of physical paper, e.g. A4 or US Letter, and insert header/footer information.

weasyprint honours these directives, which makes it a great too for generating PDFs from webpages.

However, in the Docspell context, this can have unwanted side effects. For instance certain versions of Outlook create HTML emails that are bound to physical paper sizes that are dependent on the UI locale of the sender. That's stupid in and of itself, but that's not the end of the problem.

When converting email, Docspell inserts the email header information before the existing HTML body, before passing the resulting concatenation to the PDF converter. If the existing HTML body is enclosed in e.g. <div> referincing an @page directive, then this will force weasyprint to insert a hard page break after the email header information, which means that in preview mode, the actual email text isn't visible until the 2nd page, and not at all visible in the preview images.

The following is a potential fix. weasyprint takes a --stylesheet argument and mixes the stylesheet in with the rest. If the weasyprint stanza in docspell-joex.conf is extended accordingly to pass such a stylesheet, which the contains:

* {
  page: inherit !important;
}

then all @page directives are effectively ignored.

Maybe such a stylesheet could be provided by Docspell, possibly even by default, and something added to the documentation about it?

eikek commented 11 months ago

I think this might be nice to be added to the library used for this. It would be here. Perhaps it can take an additional argument to add some css or a generic head section etc. It can also be the default, as it seems to be some more email related thing.

madduck commented 11 months ago

Would you consider using a templating system to generate the HTML views? It could defer to other CSS files, and moreover make the HTML view configurable, i.e. I might want to include the message ID…

eikek commented 11 months ago

Yes, I think more customization is good. But I wouldn't want to add another dependency for the existing module. But the signatures can be opened for such extensions and another module can provide something fancy using some other library. Perhaps providing something simple without another dependency is already good enough.

madduck commented 11 months ago

Well, for now it's enough to just inject the CSS.

eikek commented 11 months ago

Since this is only for emails anyways, I think, disabling the page directive by default is fine, isn't it?

madduck commented 11 months ago

Yeah, I think that'll do for now. The templating can be introduced in Docspell 2.0! ;)