NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
576 stars 43 forks source link

Error when building Docker: #12

Closed chuong98 closed 6 months ago

chuong98 commented 6 months ago

Hi, When I run the docker/build.sh, the error happens at this line: RUN pip install "nvidia-modelopt[all]~=$MODELOPT_VERSION" -U --extra-index-url https://urm.nvidia.com/artifactory/api/pypi/nv-shared-pypi/simple --extra-index-url https://gitlab-master.nvidia.com/api/v4/projects/95421/packages/pypi/simple

It returns WARNING: `#10 436.7 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7e65ac97d0f0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /api/v4/projects/95421/packages/pypi/simple/annotated-types/

10 436.8 Collecting annotated-types>=0.4.0`

I check both the links:

kevalmorabia97 commented 6 months ago

Thanks for catching this. Yes thats a typo. Can you just remove both these extra index URLs and instead add --extra-index-url https://pypi.nvidia.com and re-run the command. We will update the dockerfiles soon

chuong98 commented 6 months ago

Thanks, after fixing the path, no more Warning, but there is another error at this step: RUN ln -s /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.9 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so

Error:

#13 [ 9/14] RUN ln -s /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.9 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so
#13 0.368 ln: failed to create symbolic link '/usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so': No such file or directory
#13 ERROR: process "/bin/sh -c ln -s /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.9 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so" did not complete successfully: exit code: 1
------
 > [ 9/14] RUN ln -s /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.9 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so:
0.368 ln: failed to create symbolic link '/usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so': No such file or directory
------
Dockerfile:30
--------------------
  28 |     # TensorRT plugins.
  29 |     ENV TRT_LIBPATH=/usr/local/lib/python3.10/dist-packages/tensorrt_libs
  30 | >>> RUN ln -s /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.9 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so
  31 |     
  32 |     COPY plugins examples/plugins
cjluo-omniml commented 6 months ago

Is your python version in the docker 3.10? If not you may need to debug a bit and find out the tensorrt pip package installation path inside the docker

chuong98 commented 6 months ago

yes, it is python 3.10 according to this line: RUN apt-get update && apt-get -y install python3.10 python3-pip python-is-python3 openmpi-bin libopenmpi-dev wget git git-lfs unzip

I noticed that the tensorrt_libs is not installed by any command in the Docker file. Is it the already available from the base Image nvidia/cuda:12.3.2-devel-ubuntu22.04?

chuong98 commented 6 months ago

Ok, it turns out the error is due to I comment out the line: RUN pip install tensorrt-llm~=0.9 -U I did't plan to do with LLM models so I comment it out. Include it back then I can build the Docker Image