Open PLPeeters opened 6 years ago
Packages are not cached because doing that results in them bloating out the image, as they would live in a lower layer. Deleting them in a higher layer doesn't free up any space the image is still just as fat.
For speeding up build times, you are better off relying on creating a Python wheelhouse which contains wheel versions of all packages that you require. The wheelhouse directory can then be injected into a build in some way with packages installed from it, with fallback to PyPi if necessary. The wheelhouse directory is then deleted when done in the same layer to avoid bloating the image.
FWIW, Glyph has posted on this topic before at:
and I have also posted about it as well.
The newer versions of docker images for Python I had been working on incorporated support for using a Python wheelhouse.
I should have chosen a better title; what I mean is that pip packages are not being installed in their own RUN
command, which from what I understand effectively prevents Docker from caching the would-be pip install
layer. So I don't mean caching the packages in the image itself, which would indeed cause unnecessary bloating.
I'll check out the wheelhouse idea though, seems like an interesting workaround!
In the general case, the other issue is that a requirements.txt
file can list a local directory from which to install a package. This could even be the application code itself, as some people like to create a package from their code. In that case the application code has to already be in the image before pip
is run. So the order things is done is also based on providing one generic solution that works in all cases.
That does make sense, although if I'm not mistaken you could probably use a build argument that makes the pip install
part run after copying the code to the image (or the other way around, depending on what you want to be the default).
So I tried the wheelhouse approach and I'm running into some issues, so I must have done something wrong somewhere.
I created a .whiskey/wheelhouse
directory and ran pip wheel -r ../../requirements.txt
from there. I then tried running a build and got the following error:
Sending build context to Docker daemon 58.58MB
Step 1/31 : FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild
# Executing 2 build triggers
---> Running in 72e029677ad3
-----> Detected wheelhouse for pip
-----> Installing dependencies with pip
The command '/bin/sh -c mod_wsgi-docker-build' returned a non-zero code: 137
Docker build failed. Aborting.
Any clues?
When those images were originally written, the concept of build arguments didn't exist in docker.
As to trying to do the wheelhouse, where are trying to do that? That image probably doesn't have a new enough pip
and also likely lacks the wheel
package. I don't recollect ever using it to test wheelhouse builds.
Because Docker Inc blocked me from being able to build that image any more on Docker hub using automated builds, it has been neglected. The intent was to replace it with a newer image with it done differently that could be built using automated builds, but I have had next to no interest from Python community in all the work I have been doing on creating better docker images for using with Python, so has been little incentive.
The commands above were run on my local machine. The image I'm running has pip 9.0.1. I'm not sure what I did, but it suddenly worked. I did rebuild my wheelhouse from inside the image instead of from my local machine in order to have the correct wheels.
Okay so even when I add RUN rm -rf .whiskey/wheelhouse
at the top of my Dockerfile, the wheelhouse seems to remain in the image somewhere because it's 100 MB larger than when I don't include the wheelhouse... I checked your scripts though, and nothing seems to copy it anywhere, so I'm a bit confused... Any ideas?
The way things are right now, the pip packages are not getting cached, which often adds unnecessary build time since the packages don't change often. It would be better to run it first thing from the Dockerfile instead of in a script to leverage Docker's caching mechanism, as explained here: https://www.aptible.com/documentation/enclave/tutorials/faq/dockerfile-caching/pip-dockerfile-caching.html