Open mojimi opened 5 years ago
You could set up a docker container that can be easily deployed to other instances.
Hello,
I love removing external dependencies. Using something different from cairo has already been discussed in #342 for example. Really, I'd love to.
I can imagine rewriting cairo, or at least imagine generating PDF files out of simple drawing operations. I would love to, and maybe one day will. But there's one big problem: text.
Drawing text is not difficult. It's not really difficult. It's a nightmare. Well, it's actually never-ending nightmares in a never-ending night. You can even call that "hell" if you want.
So… Maybe one day I'll drop Pango and use HarfBuzz instead (it means rewriting the whole line-breaking algorithm in WeasyPrint, that's already frightening). But I can't even imagine not relying on HarfBuzz. And I'm not the only one:
HarfBuzz is used in Android, Chrome, ChromeOS, Firefox, GNOME, GTK+, KDE, LibreOffice, OpenJDK, PlayStation, Qt, XeTeX, and other places.
Bad news: Pango is not the only library WeasyPrint relies on to render text. It also relies on Fontconfig to find and configure fonts, and FreeType to render TrueType fonts. In case you're wondering: it's really painful too.
TL;DR: Replacing cairo and Pango can be done with a lot of work. Getting rid of all the non-Python external dependencies is nothing more than an illusion.
You could set up a docker container that can be easily deployed to other instances.
Yes. Providing Snap or Flatpak packages could be another solution.
I guess more a more batteries-ready documentation could also be a decent solution.
As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.
Here is an example of what can go wrong:
In [2]: import jinja2
In [3]: import weasyprint
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-3-4d0739b75804> in <module>
----> 1 import weasyprint
~/anaconda3/lib/python3.6/site-packages/weasyprint/__init__.py in <module>
439
440 # Work around circular imports.
--> 441 from .css import preprocess_stylesheet # noqa isort:skip
442 from .html import ( # noqa isort:skip
443 HTML5_UA_STYLESHEET, HTML5_PH_STYLESHEET, find_base_url, get_html_metadata)
~/anaconda3/lib/python3.6/site-packages/weasyprint/css/__init__.py in <module>
28 from ..logger import LOGGER, PROGRESS_LOGGER
29 from ..urls import URLFetchingError, get_url_attribute, url_join
---> 30 from . import computed_values, media_queries
31 from .properties import INHERITED, INITIAL_NOT_COMPUTED, INITIAL_VALUES
32 from .utils import remove_whitespace
~/anaconda3/lib/python3.6/site-packages/weasyprint/css/computed_values.py in <module>
15 from tinycss2.color3 import parse_color
16
---> 17 from .. import text
18 from ..logger import LOGGER
19 from ..urls import get_link_attribute
~/anaconda3/lib/python3.6/site-packages/weasyprint/text.py in <module>
12 import re
13
---> 14 import cairocffi as cairo
15 import cffi
16 import pyphen
~/anaconda3/lib/python3.6/site-packages/cairocffi/__init__.py in <module>
37
38
---> 39 cairo = dlopen(ffi, 'cairo', 'cairo-2', 'cairo-gobject-2', 'cairo.so.2')
40
41
~/anaconda3/lib/python3.6/site-packages/cairocffi/__init__.py in dlopen(ffi, *names)
34 except OSError:
35 pass
---> 36 raise OSError("dlopen() failed to load a library: %s" % ' / '.join(names))
37
38
OSError: dlopen() failed to load a library: cairo / cairo-2 / cairo-gobject-2 / cairo.so.2
In [4]: quit
Here is an example of what can go wrong:
There's no need to convince anyone that things may go wrong when you have external dependencies (moreover when you don't follow the installation guide and try to use Anaconda, but that's another story :wink:). We'd really like to find a solution about this, but without a port of Pango and Cairo (and Fontconfig and Freetype and GDK-Pixbuf and …) in Python, the only solutions we have are a better documentation or packaged distributions of WeasyPrint.
As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.
Yes, I've done my best to have a pretty good documentation for many Linux distributions, @Tontyna has really improved both the code and the documentation about installation on Windows, but we can't cover all the cases needed by users, and we have to rely on everybody's work for that. I'd be really happy to merge pull requests adding more documentation about Docker images, AWS and Anaconda :heart:.
Thank you!
On Sat, Apr 13, 2019 at 12:44 PM Guillaume Ayoub notifications@github.com wrote:
Here is an example of what can go wrong:
There's no need to convince anyone that things may go wrong when you have external dependencies (moreover when you don't follow the installation guide https://weasyprint.readthedocs.io/en/latest/install.html and try to use Anaconda, but that's another story 😉). We'd really like to find a solution about this, but without a port of Pango and Cairo (and Fontconfig and Freetype and GDK-Pixbuf and …) in Python, the only solutions we have are a better documentation or packaged distributions of WeasyPrint.
As mentioned, ready docker packages, but also AWS Lambda packages (and other PaaS) could be included in the docs as well as deployment steps for each platform. But I understand that's mostly up to the community.
Yes, I've done my best to have a pretty good documentation for many Linux distributions, @Tontyna https://github.com/Tontyna has really improved both the code and the documentation about installation on Windows, but we can't cover all the cases needed by users, and we have to rely on everybody's work for that. I'd be really happy to merge pull requests adding more documentation about Docker images, AWS and Anaconda ❤️.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Kozea/WeasyPrint/issues/841#issuecomment-482873173, or mute the thread https://github.com/notifications/unsubscribe-auth/AvH2EEj-B28nSukA7lV3_4ykScindRSQks5vgjOKgaJpZM4cTJ6M .
A manylinux build of CairoCFFI could help a lot here, that's tricky in it's own right - but not impossible.
A manylinux build of CairoCFFI could help a lot here, that's tricky in it's own right - but not impossible.
CairoCFFI requires a file generation step during its installation, and I think that the generated file depends on the version of Cairo installed on the system. I'd be happy to discuss this on a separate issue for CairoCFFI.
Cairo is now gone :).
It’s summary time about non-Python dependencies!
We’ve removed direct dependencies:
We use these libraries as direct dependencies:
We use these libraries as indirect dependencies (not exhaustive):
Removing Pango is the next step, because it’s too limited to render HTML+CSS text. But it’s really useful now: it’s used to split lines, with a lot of workarounds because of its limits. Removing it would require to directly use Harfbuzz for text shaping (as many other browsers do) and manually handle bidirectional text (or use a Python library for that).
Having only Harfbuzz as a direct dependency could be possible. It’s available with pygobject, which probably more reliable than our current code. There are also Python bindings with wheels for major OSes, so we can imagine a full WeasyPrint installation using only pip.
Of course: we’ll see that in the future. Not now.
This is cool, it would be amazing if the library that you replace Pango with is it's own project, there are definitely other python text/graphics projects that would benefit.
@liZe:
We’ve removed direct dependencies:
- Cairo
- GTK-Pixbuf
Does it mean GTK is not needed anymore or is it still needed for Pango? The Windows installation instructions still indicate to install it.
(As a side note, I had a lot of issues with Microsoft Store’s Python & WeasyPrint, I ended up installing it from the official site instead – sorry it was 2 months ago so I don’t remember the exact issues but I think it was related to WeasyPrint’s dependencies)
Does it mean GTK is not needed anymore or is it still needed for Pango? The Windows installation instructions still indicate to install it.
We ask users to install GTK with the installer because it’s the easiest way to get Pango, Harfbuzz and Fontconfig (and others) installed on Windows. So, GTK is technically not needed (and it has never been, GDK-Pixbuf is separated from GTK), but it’s in the documentation because it’s the easy way to get everything installed.
(As a side note, I had a lot of issues with Microsoft Store’s Python & WeasyPrint, I ended up installing it from the official site instead – sorry it was 2 months ago so I don’t remember the exact issues but I think it was related to WeasyPrint’s dependencies)
I’ve just tested a couple of days ago to install WeasyPrint on a fresh Windows 11 VM, and it was just:
pip install weasyprint
.All the Windows problems come from different versions of libraries installed elsewhere on the system, or from different package managers that install broken libraries for some reason. If the problem is different, please open a new issue 😀.
Hi, thanks for this beautiful and useful library.
Is there a good way to install Weasyprint and its dependencies to reduce the total image size in Docker? It seems that including Weasyprint adds about 500mb to my Debian Docker image.
Hi @vaughnkoch,
If you don’t care too much about performance and just want to use WeasyPrint as a binary, you can try to test this binary (⚠️ that’s not officially supported yet!)
Otherwise, you can test other distributions that may include less optional dependencies. But I’m curious, and 500MB seems to be a lot: which packages do you include in this size, does it include Python?
The total size using the recommended installs (e.g. libgtk) was more like 1GB, up from a previous non-weasyprint image size of 460mb, which includes python and many other packages.
However, I was able to get the additional size down to just 80mb (not great but not that much worse), by just using this in
my Dockerfile
, and adding weasyprint
to my Pipfile
. Total image size of 460MB.
RUN apt-get update -y && \
apt-get -y install \
libpango-1.0-0 \
pangoft2-1.0-0 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
That’s pretty much what’s proposed in the documentation. Installing GTK is only recommended for Windows.
Ah, I see now. Sorry I missed that. I think I initially saw the 'missing gobject' error (from lack of Pango installation), did some googling, and probably was lead to install GTK from a StackOverflow entry or similar. Thanks for the confirmation that this is the right way to install weasyprint.
I'm just opening this issue to have a discussion on it. And because it doesn't exist 😏
WeasyPrint is amazing and heavily differentiates from any other pdf creating library since it implemented its own css engine, but at the moment all the dependencies are the biggest downside in my view.
All the libraries it requires can be a hassle when building cross-platform, I've had several issues deploying to AWS services and testing on Windows.
Discussions to consider :
1) Is it feasible to think about removing the dependencies in some near feature? 2) Do you guys plan on ever doing it or there would need to be some type of investment/external incentive? 3) Which of the dependencies would be the harder to redo in python? 4) What are the biggest challenges here?
*Btw I'm only talking about external libs like Pango/Cairo/GDK