codelion / optillm

Optimizing inference proxy for LLMs
Apache License 2.0
1.22k stars 108 forks source link

Resulting docker image size (6.36GB) is quite large - is there any opportunity to reduce this? #71

Open sammcj opened 3 hours ago

sammcj commented 3 hours ago

Looking at the resulting built image, the image is getting blown out by cudnn (no surprises there):

image

(This part):

image

I'm just wondering if cudnn needs to be baked in to the image, or if perhaps the application only needs some specific libraries that might significantly reduce the size?

codelion commented 2 hours ago

@sammcj the dependencies include torch to run the cot_decoding and entropy_decoding approachs that are implemented in PyTorch - https://github.com/codelion/optillm/blob/94fad7846e82cd24f4603a4da7019ba242f40be3/requirements.txt#L7C1-L7C6

You can try commenting torch and transformers dependencies in the requirements.txt file and see if it helps if you are not going to use them.