Kozea / WeasyPrint

The awesome document factory
https://weasyprint.org
BSD 3-Clause "New" or "Revised" License
7.22k stars 686 forks source link

Enable PDF embedding #52

Open ghost opened 11 years ago

ghost commented 11 years ago

For background, see https://github.com/Kozea/WeasyPrint/issues/51

With latest version of cairocffi, attempting to embed PDF produces the following Terminal output under OS X 10.6.8:

$ weasyprint embedded_pdf_test_001.html embedded_pdf_test_001.pdf
Error for image at file:///[path_to]/xetex_test_005-crop.pdf : ValueError(u'Pixbuf error: Unrecognised image file format',)

The resulting output PDF contains the other parts of the input HTML content, but does not contain the embedded PDF.

SimonSapin commented 11 years ago

PDF embedding is not supported at the moment, so the output above is the expected result.

We could use Poppler to render PDF files to cairo surfaces and use that as images, then render them back to PDF. But that’s just silly, "real" PDF embedding like TeX does just copies low-level PDF objects. Unfortunately I don’t know if this is possible with cairo.

ghost commented 11 years ago

Ideally, any selectable (i.e. copy-pastable) text in embedded PDFs should be remain selectable in the output PDF.

thmo commented 8 years ago

From https://www.cairographics.org/cookbook/renderpdf/ it actually doesn't sound like such a bad idea to use poppler. It says:

When using a vector backend, the vectors and text in the PDF file are preserved in the output as vectors. There is no unnecessary rasterization.

thmo commented 8 years ago

In a discussion on IRC it was just noted that poppler is GPL-licensed, so cannot be used by weasyprint.

ghost commented 8 years ago

In a discussion on IRC it was just noted that poppler is GPL-licensed, so cannot be used by weasyprint.

This sounds like it could be FUD?

WeasyPrint is licensed under the modified BSD license (aka 3-clause BSD license), which is GPL-compatible.

It's entirely possible I'm missing something important, but please could you explain why you think Poppler's license means that Poppler "cannot be used by WeasyPrint"? Thanks!

liZe commented 8 years ago

This sounds like it could be FUD?

It may be.

Actually, it's really explicit in Poppler's README:

Please note that xpdf, and thus poppler, is licensed under the GPL, not the LGPL. Consequently, any application using poppler must also be licensed under the GPL.

ghost commented 8 years ago

Actually, it's really explicit in Poppler's README:

Please note that xpdf, and thus poppler, is licensed under the GPL, not the LGPL. Consequently, any application using poppler must also be licensed under the GPL.

Interesting. Strictly speaking, this depends upon how the GPL-ed package would be "used". GPL packages and 3-clause BSD packages can be shipped together (e.g. in a distro like Debian) and can specify each other as dependencies in either direction (e.g. via a package manager like Apt).

I see these potential ways forward:

  1. Re-license future versions of WeasyPrint under the GPL (or better yet, the AGPLv3), and use Poppler. I am in favour of this.
  2. Call Poppler from WeasyPrint in a way that doesn't violate the GPL, if that is possible. I suspect it is possible, despite the Poppler README, but I may be wrong.
  3. Don't use Poppler for WeasyPrint.
SimonSapin commented 8 years ago

This sounds like it could be FUD?

Let’s not accuse each other of ill intentions when there’s probably only a "logical shortcut" in a terse message.

https://www.gnu.org/licenses/gpl-faq.html#WhatDoesCompatMean says:

[To say a license is “compatible with the GPL”] means that the other license and the GNU GPL are compatible; you can combine code released under the other license with code released under the GNU GPL in one larger program.

All GNU GPL versions permit such combinations privately; they also permit distribution of such combinations provided the combination is released under the same GNU GPL version. The other license is compatible with the GPL if it permits this too.

Emphasis is mine. This means that Poppler (GPL-licensed) cannot be used in WeasyPrint without effectively changing the license of WeasyPrint to GPL.

Call Poppler from WeasyPrint in a way that doesn't violate the GPL, if that is possible.

As far as I know, whether that would be compliant with the GPL is very open to legal interpretation. So I’d rather not attempt it.

ghost commented 8 years ago

Let’s not accuse each other of ill intentions when there’s probably only a "logical shortcut" in a terse message.

Agreed. I did not intend to impute ill intentions, and apologies if my reply came across that way.

Call Poppler from WeasyPrint in a way that doesn't violate the GPL, if that is possible.

As far as I know, whether that would be compliant with the GPL is very open to legal interpretation. So I’d rather not attempt it.

I understand your caution, but it really might be viable: https://www.gnu.org/licenses/gpl-faq.html#MereAggregation

Anyhow, I'm not an expert on this. The FSF Licensing & Compliance Team exists to help answer precisely this sort of question, so I'd suggest emailing them with your concerns, to see if they can suggest a low-friction, compliant solution. After all, while copyleft is important to the FSF, facilitating the creation, refinement and dissemination of free software (like WeasyPrint) is even more important to them, IIUC :)

(A case in point is that even the GNU project includes some code that is under non-copyleft free software licenses: see Appendix C.)

SimonSapin commented 8 years ago

https://www.gnu.org/licenses/gpl-faq.html#MereAggregation

This sounds like a very grey area:

Where's the line between two separate programs, and one program with two parts? This is a legal question, which ultimately judges will decide. […] But if the semantics of the communication are intimate enough, exchanging complex internal data structures, that too could be a basis to consider the two parts as combined into a larger program.

SimonSapin commented 8 years ago

IMO a cairo surface containing vectors sounds like a "complex internal data structure".

ghost commented 8 years ago

IMO a cairo surface containing vectors sounds like a "complex internal data structure".

I'd be interested to see if the FSF would confirm that. Alternatively, what about:

  1. Re-license future versions of WeasyPrint under the GPL (or better yet, the AGPLv3), and use Poppler.

It looks like WeasyPrint has 29 contributors. Of these, several seem to have made only trivial (i.e. non-copyrightable) commits. So the number of remaining contributors from whom you would need consent to re-license WeasyPrint isn't huge, and might be worth contacting if you're amenable to this option?

SimonSapin commented 8 years ago

I am not interested in changing WeasyPrint’s license.

liZe commented 8 years ago

I am not interested in changing WeasyPrint’s license.

+1.

The poppler devs took the time to explicitely write that we should only use the lib in a GPL software. Legal, not legal, I don't care, I think that we can at least respect what they wrote.

Now, let's go back to our problem.

Another (far from ideal) solution is to convert the PDF to one (or more) SVG image(s). This can be done before calling WeasyPrint, with Inkscape for example. You can then use <img> and even take benefit from the size negociation algorithm. This solution doesn't add extra rasterisation but the text is not selectable (may be fixed by Kozea/CairoSVG#80 if anyone is interested).

SimonSapin commented 8 years ago

Another solution (to this and many other things) would be to ditch cairo and write PDF files directly ourselves. I’ve long wanted WeasyPrint to do that but it would be a lot of work, especially around fonts.

khaledhosny commented 8 years ago

You might find https://github.com/simoncozens/libtexpdf interesting.

khaledhosny commented 8 years ago

Oh, you don’t want GPL. Sorry, nevermind.