Psycojoker / prosopopee

a static website generator to make beautiful customizable pictures galleries that tell a story
http://prosopopee.readthedocs.org
GNU General Public License v3.0
327 stars 56 forks source link

Migration to Pillow and huge performance improvements #137

Open QSchulz opened 3 years ago

QSchulz commented 3 years ago

A notable change is that the resize option for images only accepts percentages for now.

Another notable change is that the .copy() function actually also applies the quality setting, unlike the implementation with graphicsmagick.

This has been tested against Pillow from 6.0.0 to 8.1.0 and Pillow-SIMD 7.0.0.post3.

Here are the different benchmarks. The setup is the following: ~1400 photos spread among 31 galleries. Building everything from scratch. Graphicsmagick means the current implementation in prosopopee. "Built Pillow 8.1.0" can be reproduced by installing libjpegturbo and then running pip3 install --no-binary :all: --force-reinstall pillow. Please follow https://pillow.readthedocs.io/en/stable/installation.html#building-from-source to make sure you have all the packages installed in your distribution prior to trying to compile Pillow.

Computer Graphicsmagick Pillow 8.1.0 Built Pillow 8.1.0 Pillow-SIMD 7.0.0.post3
Intel Q6600 (4c/4t @2.4GHz) 4GB RAM (Fedora Desktop 33) 1:37:13.06 26:57.71 17:43.66 N/A
Intel Atom N2800 (2c/4t @1.86GHz 2GB RAM (Fedora Server 33) 5:35:32.57 1:44:21.93 1:16:32.42 N/A
Intel Celeron G1610T (2c/2t @2.3GHz) 4GB RAM (Fedora Server 33) 1:42:33.79 46:10.00 26:10.30 17:30.49
Intel Core i7-8700 (6c/12t @3.2GHz) 32GB RAM (Ubuntu Desktop 20.04.2) 33:01.63 6:00.16 3:40.09 2:16.03
RaspberryPi 4 4GB RAM (Ubuntu Server 20.10) 3:44:57.00 44:43.67 33:29.86 N/A
Regenerating only one gallery of 71 photos: Computer Graphicsmagick Pillow 8.1.0 Built Pillow 8.1.0 Pillow-SIMD 7.0.0.post3
Intel Q6600 (4c/4t @2.4GHz) 4GB RAM (Fedora Desktop 33) 3:59.49 1:37.47 1:13.93 N/A
Intel Atom N2800 (2c/4t @1.86GHz 2GB RAM (Fedora Server 33) 14:59.40 6:09.36 5:04.45 N/A
Intel Celeron G1610T (2c/2t @2.3GHz) 4GB RAM (Fedora Server 33) 4:33.75 2:31.85 1:45.11 1:18.00
Intel Core i7-8700 (6c/12t @3.2GHz) 32GB RAM (Ubuntu Desktop 20.04.2) 1:31.09 21.179 12.373 8.665
RaspberryPi 4 4GB RAM (Ubuntu Server 20.10) 10:26.63 2:23.45 2:22.50 N/A

Currently, thumbnail generation is done in a single thread while parsing the galleries by calling graphicsmagick for every thumbnail to be generated. This is suboptimal even though graphicsmagick spreads its payload over all available CPU cores.

After a quick and dirty benchmarking, it was found that multiprocessed Pillow for generating thumbnails was much more efficient than graphicsmagick.

This PR adds support for generation of tuhmbnails with multiprocessed Pillow.

Multiple processes have to be used and not multiple threads because Python still uses the Global Interpreter Lock (GIL) for threads, meaning they cannot concurrently be running, which is what one wants for CPU intensive tasks such as thumbnail generation.

Multiprocess brings its own set of challenges because most data structures cannot be shared between processes, such as the cache for example. All data modified by any of the processes should be of a type handled by multiprocess.Manager data structures.

In order to have the best performances, all thumbnails for an image should be generated at once, so that the original image is opened only once. This therefore requires to keep track of images and add thumbnails to be created to the original image. This can be done via a factory which is passed to the Jinja templates so that they can request thumbnails for given images without knowing more than the original path, name of the original image and the parameters of the thumbnails to create.

The ImageFactory keeps all of those original images in a dictionary which consists of a virtual path made from the original image name and a CRC32 of all the options that applies to its thumbnails. This gives prosopopee the ability to group thumbnails per options (e.g. if options are passed in gallery settings.yaml).

The original image (or BaseImage) is returned by the ImageFactory and the templates can then request .copy() or .thumbnail() for it.

The thumbnails are kept in a dictionary whose keys are the name of the thumbnail which is made out of the original name plus its size and the crc32 of the original image and the options that apply to it. This way, thumbnails are guaranteed to be unique even if requested multiple times by templates.

The size is now read with imagesize.getsize() only once when ratio property or .copy() is called on the image so that the performance impact is minimal.

Since multiprocess.Pool.map splits iterables into pre-defined chunks which are then assigned to processes, it is needed for best performance to have processes with more or less the same taskload so that one or more processes aren't idle when one is working 100%. For that, the original images whose thumbnails are all cached should be removed from the list of images to generate thumbnails from before the list is passed to multiprocess.Pool.map so that each process has more or less the same taskload.

Thanks, Quentin