Resulting docker image size (6.36GB) is quite large - is there any opportunity to reduce this?

codelion / optillm

Optimizing inference proxy for LLMs

Apache License 2.0

1.6k stars 128 forks source link

Resulting docker image size (6.36GB) is quite large - is there any opportunity to reduce this? #71

Closed sammcj closed 3 weeks ago

sammcj commented 1 month ago

Looking at the resulting built image, the image is getting blown out by cudnn (no surprises there):

(This part):

I'm just wondering if cudnn needs to be baked in to the image, or if perhaps the application only needs some specific libraries that might significantly reduce the size?

codelion commented 1 month ago

@sammcj the dependencies include torch to run the cot_decoding and entropy_decoding approachs that are implemented in PyTorch - https://github.com/codelion/optillm/blob/94fad7846e82cd24f4603a4da7019ba242f40be3/requirements.txt#L7C1-L7C6

You can try commenting torch and transformers dependencies in the requirements.txt file and see if it helps if you are not going to use them.

sammcj commented 3 weeks ago

Thanks, that indeed reduced the image size from 6.36GB to 950MB!

FYI / for context - the reason I'm trying to cut this down is I'm looking at embedding Optillm within a Lambda I've written that provides an OpenAI compatible API in front of LLMs running on Amazon Bedrock (or technically - any LLMs running on AWS).

If I get the Optillm integration working nicely I'll be sure to give a shout out to the project and share the link here :)

codelion commented 3 weeks ago

Thanks for trying out optillm @sammcj ! Let me know if you need any more help with your setup.