codelion / optillm

Optimizing inference proxy for LLMs
Apache License 2.0
1.6k stars 128 forks source link

Resulting docker image size (6.36GB) is quite large - is there any opportunity to reduce this? #71

Closed sammcj closed 3 weeks ago

sammcj commented 1 month ago

Looking at the resulting built image, the image is getting blown out by cudnn (no surprises there):

image

(This part):

image

I'm just wondering if cudnn needs to be baked in to the image, or if perhaps the application only needs some specific libraries that might significantly reduce the size?

codelion commented 1 month ago

@sammcj the dependencies include torch to run the cot_decoding and entropy_decoding approachs that are implemented in PyTorch - https://github.com/codelion/optillm/blob/94fad7846e82cd24f4603a4da7019ba242f40be3/requirements.txt#L7C1-L7C6

You can try commenting torch and transformers dependencies in the requirements.txt file and see if it helps if you are not going to use them.

sammcj commented 3 weeks ago

Thanks, that indeed reduced the image size from 6.36GB to 950MB!

FYI / for context - the reason I'm trying to cut this down is I'm looking at embedding Optillm within a Lambda I've written that provides an OpenAI compatible API in front of LLMs running on Amazon Bedrock (or technically - any LLMs running on AWS).

If I get the Optillm integration working nicely I'll be sure to give a shout out to the project and share the link here :)

codelion commented 3 weeks ago

Thanks for trying out optillm @sammcj ! Let me know if you need any more help with your setup.