GrahamDumpleton / mod_wsgi-docker

Docker images for Apache/mod_wsgi.
Apache License 2.0
72 stars 36 forks source link

pip packages not getting cached #25

Open PLPeeters opened 6 years ago

PLPeeters commented 6 years ago

The way things are right now, the pip packages are not getting cached, which often adds unnecessary build time since the packages don't change often. It would be better to run it first thing from the Dockerfile instead of in a script to leverage Docker's caching mechanism, as explained here: https://www.aptible.com/documentation/enclave/tutorials/faq/dockerfile-caching/pip-dockerfile-caching.html

GrahamDumpleton commented 6 years ago

Packages are not cached because doing that results in them bloating out the image, as they would live in a lower layer. Deleting them in a higher layer doesn't free up any space the image is still just as fat.

For speeding up build times, you are better off relying on creating a Python wheelhouse which contains wheel versions of all packages that you require. The wheelhouse directory can then be injected into a build in some way with packages installed from it, with fallback to PyPi if necessary. The wheelhouse directory is then deleted when done in the same layer to avoid bloating the image.

GrahamDumpleton commented 6 years ago

FWIW, Glyph has posted on this topic before at:

and I have also posted about it as well.

The newer versions of docker images for Python I had been working on incorporated support for using a Python wheelhouse.

PLPeeters commented 6 years ago

I should have chosen a better title; what I mean is that pip packages are not being installed in their own RUN command, which from what I understand effectively prevents Docker from caching the would-be pip install layer. So I don't mean caching the packages in the image itself, which would indeed cause unnecessary bloating.

I'll check out the wheelhouse idea though, seems like an interesting workaround!

GrahamDumpleton commented 6 years ago

In the general case, the other issue is that a requirements.txt file can list a local directory from which to install a package. This could even be the application code itself, as some people like to create a package from their code. In that case the application code has to already be in the image before pip is run. So the order things is done is also based on providing one generic solution that works in all cases.

PLPeeters commented 6 years ago

That does make sense, although if I'm not mistaken you could probably use a build argument that makes the pip install part run after copying the code to the image (or the other way around, depending on what you want to be the default).

PLPeeters commented 6 years ago

So I tried the wheelhouse approach and I'm running into some issues, so I must have done something wrong somewhere.

I created a .whiskey/wheelhouse directory and ran pip wheel -r ../../requirements.txt from there. I then tried running a build and got the following error:

Sending build context to Docker daemon  58.58MB
Step 1/31 : FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild
# Executing 2 build triggers
 ---> Running in 72e029677ad3
 -----> Detected wheelhouse for pip
 -----> Installing dependencies with pip
The command '/bin/sh -c mod_wsgi-docker-build' returned a non-zero code: 137
Docker build failed. Aborting.

Any clues?

GrahamDumpleton commented 6 years ago

When those images were originally written, the concept of build arguments didn't exist in docker.

As to trying to do the wheelhouse, where are trying to do that? That image probably doesn't have a new enough pip and also likely lacks the wheel package. I don't recollect ever using it to test wheelhouse builds.

Because Docker Inc blocked me from being able to build that image any more on Docker hub using automated builds, it has been neglected. The intent was to replace it with a newer image with it done differently that could be built using automated builds, but I have had next to no interest from Python community in all the work I have been doing on creating better docker images for using with Python, so has been little incentive.

PLPeeters commented 6 years ago

The commands above were run on my local machine. The image I'm running has pip 9.0.1. I'm not sure what I did, but it suddenly worked. I did rebuild my wheelhouse from inside the image instead of from my local machine in order to have the correct wheels.

PLPeeters commented 6 years ago

Okay so even when I add RUN rm -rf .whiskey/wheelhouse at the top of my Dockerfile, the wheelhouse seems to remain in the image somewhere because it's 100 MB larger than when I don't include the wheelhouse... I checked your scripts though, and nothing seems to copy it anywhere, so I'm a bit confused... Any ideas?