Closed miquelduranfrigola closed 3 months ago
We will work on this with @mjwarren3 who helped us previously with dockerization of Ersilia models.
Hey @miquelduranfrigola I started exploring this today, and wanted to share initial findings / thoughts (primarily from here). Let me know if you have any heads-ups or perspectives on next steps.
First Finding: Base Ersilia Image / Repository is responsible for >50% of the Image Size (510 MB)
git clone https://github.com/ersilia-os/ersilia.git && cd /root/ersilia && pip install -e . # buildkit
Next Step: Explore whether we could trim down the Ersilia base repository to only what is needed to fetch and serve a model
Second Finding: Fetching the model from GitHub responsible for ~415 MB of space
ersilia -v fetch $MODEL --from_github # buildkit
Next Step: Explore the fetch function within Ersilia to see if it is bringing in any extra packages / content that is not needed to serve the model
My initial feel is that the full Ersilia repository and Fetch functionality might be needed for installation, but we could trim it down after the model is ready to be served
Hello @mjwarren3 this is very useful, thanks!
This would be a great start indeed. I think that, within Ersilia, BentoML (a dependency) might be responsible for a significant part of the space.
One question: what would be the best way to explore this, in practice? Run the docker container in exec mode?
And one more question, @mjwarren3 - how relevant is the base OS image? For example, using alpine
vs ubuntu
?
Hi @miquelduranfrigola @mjwarren3
What is the status of this?
Hi @GemmaTuron no major updates since the notes above ^
Saw that this topic was being discussed in another issue though: https://github.com/ersilia-os/eos7d58/issues/2
Would be great to have someone else take a look if Dhanshree or Hellen wanted to
And one more question, @mjwarren3 - how relevant is the base OS image? For example, using
alpine
vsubuntu
?
The latest Alpine image is ~7mb and the latest Ubuntu image is ~78MB, so there is some improvement potential there as well. I'd think we're trying to cut this in half, or more, so won't be the major driver though
I also just tried running the Docker Image through Dive, and have some initial findings from that Overall Metrics
Largest Build Steps
Largest Files / Packages in the Final Image - in Directory Format
/usr is 2.1 GB
/bin/conda is 1.4 GB
/envs is 646 MB
/eos4wt0 is 454MB
/python3.10/site-packages is 278 MB
rdkit is 66mb, rdkit.libs is 49mb, numpy is 34mb, numpy.libs is 38mb, botocore is 6mb
/eosbase-bentoml is 192 MB
/python3.10/site-packages is 135 MB
numpy.libs is 38MB, numpy is 34MB, botocore is 16MB **(OVERLAP - packages installed twice)**
/pkgs is 378 MB
/cache is 101 MB (can we delete this?)
/python3.11 is 85 MB
/python 3.10.13 is 54 MB **(Two installations of Python? Can we delete 3.11?)**
/ lib is 289 MB
/python3.11 is 95MB --> site-packages is 45 MB --> pip is 13MB, setuptools, conda, etc. also in there
/libicudata is 52MB
/libpython3.11 is 25MB
/bunch of other libs that are <10mb
/local is 376 MB
/lib/Python3.7 is 372MB
/site-packages is 347MB
botocore is 80MB, rdkit is 63MB, numpy is 62MB, pip is 10MB, SQLAlchemy is 18MB
/lib is 253 MB
/x86_64_linux-gnu is 127MB
/libicudata.so.63.1 is 27MB
/perl is 23MB
SUMMARY / PRIMARY FINDINGS
Based on the findings above, any other thoughts on how we could reduce the overlap? Ideally we end up with just one installation of Python, and we only install Python packages once to reduce overlap.
@miquelduranfrigola Would be great to get your thoughts on the above when you get a chance - looks like some significant overlap of Python packages / installations that could potentially enable us to reduce the size of the Docker Image
Hello @mjwarren3 , this is tremendously useful. Thanks so much. I am looping @DhanshreeA in since she has now joined Ersilia and she is much better than I am at these things.
I completely agree with your suggested approach. Since we are using the images to eventually serve the model, there is a lot of stuff we can remove safely after fetch is done. So, basically, the docker build should:
Point 3 is currently missing.
We can definitely:
git
and git-lfs
pip
serve
conda clean
) or even delete the package manager if explicit python paths can be specified (not sure how easy that would be)ersilia
dependencies not used for serve. For example, pyairtable
and rdkit
(which is install dynamically), and several others. This can be time consuming. Another option would be to have a light ersilia installation spec, or even an independent package, that strictly does ersilia serve
only, which is the only functionality we need inside the docker image.~/eos
repository. There is a dest
and a repository
folder. While those can seem redundant, in theory, they are not and symbolic links are used. But it would be worth double checking.ubuntu
to alpine
won't move the needle. However, it may be worth trying if this small modification works swiftly?(As an aside, I think we want to eventually remove the bentoml
dependency. This will certainly help, but it will require more work and it will probably apply to futur models, not legacy ones.)
This discussion is super helpful, thank you both. I'm putting it on the top of my priority list for this week!
As discussed earlier today, let's start by assuming that bentoml
will remain there. In the future, we may achieve smaller images if bentoml
is removed, but for now, let's not touch it.
Hey folks, updating some metrics here:
After introducing multistage builds in the PR above #1022, models eos3b5e
and eos4wt0
got reduced from 2.35GB to 979MB and 2.41 GB to 985MB respectively. Will update more as I scale it to other models in the hub.
This is quite amazing - thanks for the update @DhanshreeA
Another interesting observation and potentially an issue with docker builds: https://github.com/ersilia-os/ersilia/issues/1097
@miquelduranfrigola @GemmaTuron do you think it's safe to close this issue now?
Yes, I think so!
Is your feature request related to a problem? Please describe.
Context
Ersilia's Docker images are stored in DockeHub. For example, Ersilia's image for model eos4wt0 can be found here. All images are public in the ersiliaos DockerHub profile.
latest
) and, often, we provide twoOS/ARCH
architectures, namelylinux/amd64
andlinux/arm64
.eos-template
repository. Consequently, this workflow is replicated in the specific model repository. For example, for eos4wt0 the workflow can be found here.Problem
Ersilia's Docker images are very large, most often above 2GB. This creates issues in low-resource settings. Most likely, this large volume is not related to the model parameters themselves, but the large amount of (unnecessary) dependencies and tools.
Describe the solution you'd like.
We would like to have a GitHub Action workflow that cleans docker images a posteriori. That is, given a model identifier, this action would fetch the corresponding image, clear unnecessary dependencies, and push it back to DockerHub. I am unsure whether this is feasible or the best approach at all. Any advice would be much appreciated.
More specifically, the steps towards a solution would include.
clean_ersilia_docker_image.sh
script. This script would be able to remove unnecessary dependencies in a given Ersilia model. The script could be stored here, for example.clean_docker_images.yml
. This workflow would (a) pull docker images, (b) run theclean_ersilia_docker_image.sh
and (c) push back the image to DockerHub. This workflow file could be stored here.Describe alternatives you've considered
Alternatives I've considered include rebuilding Docker images with a slim base image (e.g. Alpine). While this sounds interesting, it would be cumbersome retrospectively.
Additional context.
A good model to start with is the one mentioned above (eos4wt0). This model should be really lightweight.