Open Prots opened 1 month ago
It is because many users ask to install the TensorRT-LLM python packages in the docker image (before 24.03, the docker image only contains the backend, it does not contain the TRT-LLM python packages) and some related dependencies about the installation.
It is because many users ask to install the TensorRT-LLM python packages in the docker image (before 24.03, the docker image only contains the backend, it does not contain the TRT-LLM python packages) and some related dependencies about the installation.
Hi, do you consider releasing a smaller image for serving-only purpose like 24.03? Normally, trtllm python sdk is only used in dev environment, and it can be easily installed by pip install
. We don't see the necessity to put it in the docker image, as the package and its requirements occupy too much space/bandwidth.
When I pull images locally the size was showing twice bigger than expected
Ah, this is because of mulitplatform image.
We will discuss that. Because using pip to install tensorrt-llm often leads to version mismatch before TRT-LLM supports backward compability, we determine to install the tensorrt-llm python package in the tritonserver docker image since the 24.04 docker image and lead to larger docker image size.
@byshiue is it possible to install triton + tensorrt_llm backend using multistage build? Like take py3-min image and copy all needed libs from trtllm-python-py3 image.
Another question about multi-platform image, I use 24.04-trtllm-python-py3 image as a base, added quantized llama3 model with 8.5 GB size and received about 42 GB result image.
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
COPY TensorRT-LLM/Meta-Llama-3-8B-Instruct-tokenizer /TensorRT-LLM/Meta-Llama-3-8B-Instruct-tokenizer
COPY tensorrtllm_backend/all_models/inflight_batcher_llm /tensorrtllm_backend/all_models/inflight_batcher_llm
CMD tritonserver --model-repository /tensorrtllm_backend/all_models/inflight_batcher_llm
Build command is
docker build -f Dockerfile --platform linux/amd64 --no-cache -t triton_llama3_int8_sq:0.0.2 .
So I set --platform linux/amd64
but docker takes whole image for both platforms linux/amd64 and linux/arm64. Why does it happen?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
Hello, I'm using 24.03-trtllm-python-py3 with image size 8.38 GB which is not small but ok. I'm going to migrate to the newest versions like 24.04 or 24.05 but it size drastically increased to 18.46 GB or even 18.48 GB. So my question what was the reason of +10 GB and how to create a smaller image with triton+tensorttllm? Thanks