Kozea / WeasyPrint

The awesome document factory
https://weasyprint.org
BSD 3-Clause "New" or "Revised" License
7.23k stars 686 forks source link

Get rid of external dependencies #841

Open mojimi opened 5 years ago

mojimi commented 5 years ago

I'm just opening this issue to have a discussion on it. And because it doesn't exist 😏

WeasyPrint is amazing and heavily differentiates from any other pdf creating library since it implemented its own css engine, but at the moment all the dependencies are the biggest downside in my view.

All the libraries it requires can be a hassle when building cross-platform, I've had several issues deploying to AWS services and testing on Windows.

Discussions to consider :

1) Is it feasible to think about removing the dependencies in some near feature? 2) Do you guys plan on ever doing it or there would need to be some type of investment/external incentive? 3) Which of the dependencies would be the harder to redo in python? 4) What are the biggest challenges here?

*Btw I'm only talking about external libs like Pango/Cairo/GDK

MindFluid commented 5 years ago

You could set up a docker container that can be easily deployed to other instances.

liZe commented 5 years ago

Hello,

I love removing external dependencies. Using something different from cairo has already been discussed in #342 for example. Really, I'd love to.

I can imagine rewriting cairo, or at least imagine generating PDF files out of simple drawing operations. I would love to, and maybe one day will. But there's one big problem: text.

Drawing text is not difficult. It's not really difficult. It's a nightmare. Well, it's actually never-ending nightmares in a never-ending night. You can even call that "hell" if you want.

So… Maybe one day I'll drop Pango and use HarfBuzz instead (it means rewriting the whole line-breaking algorithm in WeasyPrint, that's already frightening). But I can't even imagine not relying on HarfBuzz. And I'm not the only one:

HarfBuzz is used in Android, Chrome, ChromeOS, Firefox, GNOME, GTK+, KDE, LibreOffice, OpenJDK, PlayStation, Qt, XeTeX, and other places.

Bad news: Pango is not the only library WeasyPrint relies on to render text. It also relies on Fontconfig to find and configure fonts, and FreeType to render TrueType fonts. In case you're wondering: it's really painful too.

TL;DR: Replacing cairo and Pango can be done with a lot of work. Getting rid of all the non-Python external dependencies is nothing more than an illusion.

You could set up a docker container that can be easily deployed to other instances.

Yes. Providing Snap or Flatpak packages could be another solution.

mojimi commented 5 years ago

I guess more a more batteries-ready documentation could also be a decent solution.

As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.

pperona commented 5 years ago

Here is an example of what can go wrong:

In [2]: import jinja2                                                           

In [3]: import weasyprint                                                       
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-3-4d0739b75804> in <module>
----> 1 import weasyprint

~/anaconda3/lib/python3.6/site-packages/weasyprint/__init__.py in <module>
    439 
    440 # Work around circular imports.
--> 441 from .css import preprocess_stylesheet  # noqa isort:skip
    442 from .html import (  # noqa isort:skip
    443     HTML5_UA_STYLESHEET, HTML5_PH_STYLESHEET, find_base_url, get_html_metadata)

~/anaconda3/lib/python3.6/site-packages/weasyprint/css/__init__.py in <module>
     28 from ..logger import LOGGER, PROGRESS_LOGGER
     29 from ..urls import URLFetchingError, get_url_attribute, url_join
---> 30 from . import computed_values, media_queries
     31 from .properties import INHERITED, INITIAL_NOT_COMPUTED, INITIAL_VALUES
     32 from .utils import remove_whitespace

~/anaconda3/lib/python3.6/site-packages/weasyprint/css/computed_values.py in <module>
     15 from tinycss2.color3 import parse_color
     16 
---> 17 from .. import text
     18 from ..logger import LOGGER
     19 from ..urls import get_link_attribute

~/anaconda3/lib/python3.6/site-packages/weasyprint/text.py in <module>
     12 import re
     13 
---> 14 import cairocffi as cairo
     15 import cffi
     16 import pyphen

~/anaconda3/lib/python3.6/site-packages/cairocffi/__init__.py in <module>
     37 
     38 
---> 39 cairo = dlopen(ffi, 'cairo', 'cairo-2', 'cairo-gobject-2', 'cairo.so.2')
     40 
     41 

~/anaconda3/lib/python3.6/site-packages/cairocffi/__init__.py in dlopen(ffi, *names)
     34             except OSError:
     35                 pass
---> 36     raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
     37 
     38 

OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2

In [4]: quit 
liZe commented 5 years ago

Here is an example of what can go wrong:

There's no need to convince anyone that things may go wrong when you have external dependencies (moreover when you don't follow the installation guide and try to use Anaconda, but that's another story :wink:). We'd really like to find a solution about this, but without a port of Pango and Cairo (and Fontconfig and Freetype and GDK-Pixbuf and …) in Python, the only solutions we have are a better documentation or packaged distributions of WeasyPrint.

As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.

Yes, I've done my best to have a pretty good documentation for many Linux distributions, @Tontyna has really improved both the code and the documentation about installation on Windows, but we can't cover all the cases needed by users, and we have to rely on everybody's work for that. I'd be really happy to merge pull requests adding more documentation about Docker images, AWS and Anaconda :heart:.

pperona commented 5 years ago

Thank you!

On Sat, Apr 13, 2019 at 12:44 PM Guillaume Ayoub notifications@github.com wrote:

Here is an example of what can go wrong:

There's no need to convince anyone that things may go wrong when you have external dependencies (moreover when you don't follow the installation guide https://weasyprint.readthedocs.io/en/latest/install.html and try to use Anaconda, but that's another story 😉). We'd really like to find a solution about this, but without a port of Pango and Cairo (and Fontconfig and Freetype and GDK-Pixbuf and …) in Python, the only solutions we have are a better documentation or packaged distributions of WeasyPrint.

As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.

Yes, I've done my best to have a pretty good documentation for many Linux distributions, @Tontyna https://github.com/Tontyna has really improved both the code and the documentation about installation on Windows, but we can't cover all the cases needed by users, and we have to rely on everybody's work for that. I'd be really happy to merge pull requests adding more documentation about Docker images, AWS and Anaconda ❤️.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Kozea/WeasyPrint/issues/841#issuecomment-482873173, or mute the thread https://github.com/notifications/unsubscribe-auth/AvH2EEj-B28nSukA7lV3_4ykScindRSQks5vgjOKgaJpZM4cTJ6M .

stuaxo commented 4 years ago

A manylinux build of CairoCFFI could help a lot here, that's tricky in it's own right - but not impossible.

liZe commented 4 years ago

A manylinux build of CairoCFFI could help a lot here, that's tricky in it's own right - but not impossible.

CairoCFFI requires a file generation step during its installation, and I think that the generated file depends on the version of Cairo installed on the system. I'd be happy to discuss this on a separate issue for CairoCFFI.

liZe commented 3 years ago

Cairo is now gone :).

liZe commented 3 years ago

It’s summary time about non-Python dependencies!

We’ve removed direct dependencies:

We use these libraries as direct dependencies:

We use these libraries as indirect dependencies (not exhaustive):

Removing Pango is the next step, because it’s too limited to render HTML+CSS text. But it’s really useful now: it’s used to split lines, with a lot of workarounds because of its limits. Removing it would require to directly use Harfbuzz for text shaping (as many other browsers do) and manually handle bidirectional text (or use a Python library for that).

Having only Harfbuzz as a direct dependency could be possible. It’s available with pygobject, which probably more reliable than our current code. There are also Python bindings with wheels for major OSes, so we can imagine a full WeasyPrint installation using only pip.

Of course: we’ll see that in the future. Not now.

stuaxo commented 3 years ago

This is cool, it would be amazing if the library that you replace Pango with is it's own project, there are definitely other python text/graphics projects that would benefit.

DidierLoiseau commented 2 years ago

@liZe:

We’ve removed direct dependencies:

  • Cairo
  • GTK-Pixbuf

Does it mean GTK is not needed anymore or is it still needed for Pango? The Windows installation instructions still indicate to install it.

(As a side note, I had a lot of issues with Microsoft Store’s Python & WeasyPrint, I ended up installing it from the official site instead – sorry it was 2 months ago so I don’t remember the exact issues but I think it was related to WeasyPrint’s dependencies)

liZe commented 2 years ago

Does it mean GTK is not needed anymore or is it still needed for Pango? The Windows installation instructions still indicate to install it.

We ask users to install GTK with the installer because it’s the easiest way to get Pango, Harfbuzz and Fontconfig (and others) installed on Windows. So, GTK is technically not needed (and it has never been, GDK-Pixbuf is separated from GTK), but it’s in the documentation because it’s the easy way to get everything installed.

(As a side note, I had a lot of issues with Microsoft Store’s Python & WeasyPrint, I ended up installing it from the official site instead – sorry it was 2 months ago so I don’t remember the exact issues but I think it was related to WeasyPrint’s dependencies)

I’ve just tested a couple of days ago to install WeasyPrint on a fresh Windows 11 VM, and it was just:

All the Windows problems come from different versions of libraries installed elsewhere on the system, or from different package managers that install broken libraries for some reason. If the problem is different, please open a new issue 😀.

vaughnkoch commented 1 year ago

Hi, thanks for this beautiful and useful library.

Is there a good way to install Weasyprint and its dependencies to reduce the total image size in Docker? It seems that including Weasyprint adds about 500mb to my Debian Docker image.

liZe commented 1 year ago

Hi @vaughnkoch,

If you don’t care too much about performance and just want to use WeasyPrint as a binary, you can try to test this binary (⚠️ that’s not officially supported yet!)

Otherwise, you can test other distributions that may include less optional dependencies. But I’m curious, and 500MB seems to be a lot: which packages do you include in this size, does it include Python?

vaughnkoch commented 1 year ago

The total size using the recommended installs (e.g. libgtk) was more like 1GB, up from a previous non-weasyprint image size of 460mb, which includes python and many other packages.

However, I was able to get the additional size down to just 80mb (not great but not that much worse), by just using this in my Dockerfile, and adding weasyprint to my Pipfile. Total image size of 460MB.

RUN apt-get update -y && \
    apt-get -y install \
    libpango-1.0-0 \
    pangoft2-1.0-0 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
liZe commented 1 year ago

That’s pretty much what’s proposed in the documentation. Installing GTK is only recommended for Windows.

vaughnkoch commented 1 year ago

Ah, I see now. Sorry I missed that. I think I initially saw the 'missing gobject' error (from lack of Pango installation), did some googling, and probably was lead to install GTK from a StackOverflow entry or similar. Thanks for the confirmation that this is the right way to install weasyprint.