Very large install size in Docker

RoryMMMM commented 1 month ago

I'm building up an environmnet to use in a HPC setting. The goal is to build a docker container with several Python packages and to then convert this to a Singularity file to be used in the univesity's Slurm HPC cluster.

The one thing I've noticed, which isn't great, is that the install size of Repast4py is huge. The docker image is ~8.2 GB in size. After taking a look at the docker image layers:

> docker history 799316543ca4

IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
799316543ca4   9 minutes ago    /bin/sh -c #(nop)  ENV PYTHONPATH=/repast4py…   0B        
91a65220423d   9 minutes ago    /bin/sh -c env CC=mpicxx CXX=mpicxx pip inst…   8.16GB    
1e7b9cd039e3   19 minutes ago   /bin/sh -c pip install -r ./requirements.txt    199MB     
6ccb1de7ca54   21 minutes ago   /bin/sh -c #(nop) COPY file:ab16ddc3a986b259…   283B      
a45824896792   25 minutes ago   /bin/sh -c apt-get update &&     apt-get ins…   340MB     
73b513f59526   2 years ago      /bin/sh -c #(nop)  CMD ["python3"]              0B        
<missing>      2 years ago      /bin/sh -c set -ex;   savedAptMark="$(apt-ma…   9.51MB    
<missing>      2 years ago      /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_SHA256…   0B        
<missing>      2 years ago      /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_URL=ht…   0B        
<missing>      2 years ago      /bin/sh -c #(nop)  ENV PYTHON_SETUPTOOLS_VER…   0B        
<missing>      2 years ago      /bin/sh -c #(nop)  ENV PYTHON_PIP_VERSION=21…   0B        
<missing>      2 years ago      /bin/sh -c cd /usr/local/bin  && ln -s idle3…   32B       
<missing>      2 years ago      /bin/sh -c set -ex   && savedAptMark="$(apt-…   29.5MB    
<missing>      2 years ago      /bin/sh -c #(nop)  ENV PYTHON_VERSION=3.10.0    0B        
<missing>      2 years ago      /bin/sh -c #(nop)  ENV GPG_KEY=A035C8C19219B…   0B        
<missing>      2 years ago      /bin/sh -c set -eux;  apt-get update;  apt-g…   3.11MB    
<missing>      2 years ago      /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0B        
<missing>      2 years ago      /bin/sh -c #(nop)  ENV PATH=/usr/local/bin:/…   0B        
<missing>      2 years ago      /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      2 years ago      /bin/sh -c #(nop) ADD file:ece5ff85ca549f0b1…   80.4MB

And the dockerfile (similar to the file in the repast git repo):

FROM python:3.10.0-slim 

RUN apt-get update && \
    apt-get install -y  mpich \
        && rm -rf /var/lib/apt/lists/*

# Install the python requirements
COPY ./requirements.txt ./requirements.txt
RUN pip install -r ./requirements.txt

# Install repast4py
RUN env CC=mpicxx CXX=mpicxx pip install repast4py

# Set the PYTHONPATH to include the /repast4py folder which contains the core folder
ENV PYTHONPATH=/repast4py/src

It's clear that the 8 Gig layer is coming from the RUN env CC=mpicxx CXX=mpicxx pip install repast4py command.

Using a file this large isn't impossible, but it introduces some issues with storing this in a limited free repo, building it with CI/CD, moving it to nodes in the cluster etc.

Is there a simple way to reduce the build size? I can't really believe that it's using 8 gigs of compiled C code to run repast4py.

RoryMMMM commented 1 month ago

After having a look at Issue 68 I tried to use a CPU only version of Torch. This moved the huge layer to the "pip install" layer. Which implies that Torch was being installed in the RUN env CC=mpicxx CXX=mpicxx pip install repast4py step instead of the requirements.txt step.

So it looks like the problem was Torch all along. I'm not planning on using GPU support, for now, but ~1.5 Gigs is much better :

$ docker history 9a1cb25cd860
IMAGE          CREATED             CREATED BY                                      SIZE      COMMENT
9a1cb25cd860   2 minutes ago       /bin/sh -c #(nop)  ENV PYTHONPATH=/repast4py…   0B        
2e321edc7386   2 minutes ago       /bin/sh -c env CC=mpicxx CXX=mpicxx pip inst…   28.4MB    
64236b6f3e79   3 minutes ago       /bin/sh -c pip install -r ./requirements.txt    1.42GB    
00430ad1e55c   8 minutes ago       /bin/sh -c #(nop) COPY file:e3a9b2a1f65788bd…   418B      
a45824896792   About an hour ago   /bin/sh -c apt-get update &&     apt-get ins…   340MB     
73b513f59526   2 years ago         /bin/sh -c #(nop)  CMD ["python3"]              0B        
<missing>      2 years ago         /bin/sh -c set -ex;   savedAptMark="$(apt-ma…   9.51MB    
<missing>      2 years ago         /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_SHA256…   0B        
<missing>      2 years ago         /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_URL=ht…   0B        
<missing>      2 years ago         /bin/sh -c #(nop)  ENV PYTHON_SETUPTOOLS_VER…   0B        
<missing>      2 years ago         /bin/sh -c #(nop)  ENV PYTHON_PIP_VERSION=21…   0B        
<missing>      2 years ago         /bin/sh -c cd /usr/local/bin  && ln -s idle3…   32B       
<missing>      2 years ago         /bin/sh -c set -ex   && savedAptMark="$(apt-…   29.5MB    
<missing>      2 years ago         /bin/sh -c #(nop)  ENV PYTHON_VERSION=3.10.0    0B        
<missing>      2 years ago         /bin/sh -c #(nop)  ENV GPG_KEY=A035C8C19219B…   0B        
<missing>      2 years ago         /bin/sh -c set -eux;  apt-get update;  apt-g…   3.11MB    
<missing>      2 years ago         /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0B        
<missing>      2 years ago         /bin/sh -c #(nop)  ENV PATH=/usr/local/bin:/…   0B        
<missing>      2 years ago         /bin/sh -c #(nop)  CMD ["bash"]                 0B        
<missing>      2 years ago         /bin/sh -c #(nop) ADD file:ece5ff85ca549f0b1…   80.4MB

Requirements.txt

# Using python:3.10.0-slim
# Use CPU Torch for a smaller install
--extra-index-url https://download.pytorch.org/whl/cpu 
torch # https://github.com/Repast/repast4py/issues/68
mpi4py==4.0.0 #https://pypi.org/project/mpi4py/
numpy==1.26.4 #2.1.1
pandas==2.2.3
numba==0.60.0 
coverage==7.6.1 
networkx==3.3 
pyyaml==6.0.2 
Cython==3.0.11 
llvmlite==0.43.0

RoryMMMM commented 1 month ago

I did a quick search in the git repo to see where Torch was being used:

Requirements.txt
a bunch of mentions in tests/.py and examples/.py
a random seed generator in src/repast4py/random.py
some mentions in the geometry.py and value_layer.py

This might be a silly question, but is Torch integral to Repast4Py?

etatara commented 1 month ago

Thanks for sharing your experience. I created #68 when encountering the same issue with creating Docker images. I wasn't able to get the docker image smaller than about what you are showing. I'll look into further ways to reduce the size of the repast4py installs based on individual use case requirements.

RoryMMMM commented 1 month ago

Thanks for looking into this! Let me know if I can help in any way.

Repast / repast4py

Very large install size in Docker #72