NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.71k stars 996 forks source link

error: make -C docker release_build : Command 'git submodule update --init --recursive' returned non-zero exit status 128 #2479

Open xddun opened 1 day ago

xddun commented 1 day ago

System Info

env:

ubuntu22 RTX3090 Linux euler-MS-7D30 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

I wanted to build an image, but unexpectedly encountered an error. My process was as follows in 4steps:

  1. git clone https://github.com/NVIDIA/TensorRT-LLM.git
  2. cd TensorRT-LLM
  3. git lfs pull
  4. make -C docker release_build

error log:

TensorRT-LLM$ make -C docker release_build
make: Entering directory '/data/xiedong/TensorRT-LLM/docker'
Building docker image: tensorrt_llm/release:latest
DOCKER_BUILDKIT=1 docker build --pull  \
        --progress auto \
         --build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
         --build-arg BASE_TAG=24.10-py3 \
         --build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" \
         --build-arg TORCH_INSTALL_TYPE="skip" \
         \
         \
         \
         \
         \
         --build-arg TRT_LLM_VER="0.16.0.dev2024111900" \
         \
         --build-arg GIT_COMMIT="535c9cc6730f5ac999e4b1cb621402b58138f819" \
         --target release \
        --file Dockerfile.multi \
        --tag tensorrt_llm/release:latest \
        ..
[+] Building 3.5s (33/44)                                                                                                                            docker:default
 => [internal] load build definition from Dockerfile.multi                                                                                                     0.0s
 => => transferring dockerfile: 3.98kB                                                                                                                         0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6)                                                                                 0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 14)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 57)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 75)                                                                                0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:24.10-py3                                                                                              2.4s
 => [internal] load .dockerignore                                                                                                                              0.1s
 => => transferring context: 257B                                                                                                                              0.0s
 => [internal] load build context                                                                                                                              0.2s
 => => transferring context: 342.26kB                                                                                                                          0.1s
 => [base 1/1] FROM nvcr.io/nvidia/pytorch:24.10-py3@sha256:36555b43d382425a4281ecfbcb41de2f95fb542ca8e531c5486be10df8026f9d                                   0.0s
 => CACHED [devel  1/16] COPY docker/common/install_base.sh install_base.sh                                                                                    0.0s
 => CACHED [devel  2/16] RUN bash ./install_base.sh && rm install_base.sh                                                                                      0.0s
 => CACHED [devel  3/16] COPY docker/common/install_cmake.sh install_cmake.sh                                                                                  0.0s
 => CACHED [devel  4/16] RUN bash ./install_cmake.sh && rm install_cmake.sh                                                                                    0.0s
 => CACHED [devel  5/16] COPY docker/common/install_ccache.sh install_ccache.sh                                                                                0.0s
 => CACHED [devel  6/16] RUN bash ./install_ccache.sh && rm install_ccache.sh                                                                                  0.0s
 => CACHED [devel  7/16] COPY docker/common/install_cuda_toolkit.sh install_cuda_toolkit.sh                                                                    0.0s
 => CACHED [devel  8/16] RUN bash ./install_cuda_toolkit.sh && rm install_cuda_toolkit.sh                                                                      0.0s
 => CACHED [devel  9/16] COPY docker/common/install_tensorrt.sh install_tensorrt.sh                                                                            0.0s
 => CACHED [devel 10/16] RUN bash ./install_tensorrt.sh     --TRT_VER=${TRT_VER}     --CUDA_VER=${CUDA_VER}     --CUDNN_VER=${CUDNN_VER}     --NCCL_VER=${NCC  0.0s
 => CACHED [devel 11/16] COPY docker/common/install_polygraphy.sh install_polygraphy.sh                                                                        0.0s
 => CACHED [devel 12/16] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh                                                                          0.0s
 => CACHED [devel 13/16] COPY docker/common/install_mpi4py.sh install_mpi4py.sh                                                                                0.0s
 => CACHED [devel 14/16] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh                                                                                  0.0s
 => CACHED [devel 15/16] COPY docker/common/install_pytorch.sh install_pytorch.sh                                                                              0.0s
 => CACHED [devel 16/16] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh                                                                           0.0s
 => CACHED [release  1/13] RUN mkdir -p /root/.cache/pip                                                                                                       0.0s
 => CACHED [release  2/13] WORKDIR /app/tensorrt_llm                                                                                                           0.0s
 => CACHED [wheel  1/10] WORKDIR /src/tensorrt_llm                                                                                                             0.0s
 => CACHED [wheel  2/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  3/10] COPY cpp cpp                                                                                                                          0.0s
 => CACHED [wheel  4/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  5/10] COPY scripts scripts                                                                                                                  0.0s
 => CACHED [wheel  6/10] COPY tensorrt_llm tensorrt_llm                                                                                                        0.0s
 => CACHED [wheel  7/10] COPY 3rdparty 3rdparty                                                                                                                0.0s
 => CACHED [wheel  8/10] COPY .gitmodules setup.py requirements.txt requirements-dev.txt ./                                                                    0.0s
 => CACHED [wheel  9/10] RUN mkdir -p /root/.cache/pip /root/.cache/ccache                                                                                     0.0s
 => ERROR [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --cle  0.6s
------
 > [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks:
0.460 fatal: not a git repository (or any of the parent directories): .git
0.460 Traceback (most recent call last):
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 434, in <module>
0.460     main(**vars(args))
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 107, in main
0.460     build_run('git submodule update --init --recursive')
0.460   File "/usr/lib/python3.10/subprocess.py", line 526, in run
0.469     raise CalledProcessError(retcode, process.args,
0.469 subprocess.CalledProcessError: Command 'git submodule update --init --recursive' returned non-zero exit status 128.
------
Dockerfile.multi:72
--------------------
  71 |     ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks"
  72 | >>> RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache \
  73 | >>>     python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}
  74 |
--------------------
ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1
make: *** [Makefile:64: release_build] Error 1
make: Leaving directory '/data/xiedong/TensorRT-LLM/docker'

Who can help?

Is it possible to provide a pre-configured image with the environment already set up? Compiling the image is really challenging!

Information

Tasks

Reproduction

env:

ubuntu22 RTX3090 Linux euler-MS-7D30 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

I wanted to build an image, but unexpectedly encountered an error. My process was as follows in 4steps:

  1. git clone https://github.com/NVIDIA/TensorRT-LLM.git
  2. cd TensorRT-LLM
  3. git lfs pull
  4. make -C docker release_build

error log:

TensorRT-LLM$ make -C docker release_build
make: Entering directory '/data/xiedong/TensorRT-LLM/docker'
Building docker image: tensorrt_llm/release:latest
DOCKER_BUILDKIT=1 docker build --pull  \
        --progress auto \
         --build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
         --build-arg BASE_TAG=24.10-py3 \
         --build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" \
         --build-arg TORCH_INSTALL_TYPE="skip" \
         \
         \
         \
         \
         \
         --build-arg TRT_LLM_VER="0.16.0.dev2024111900" \
         \
         --build-arg GIT_COMMIT="535c9cc6730f5ac999e4b1cb621402b58138f819" \
         --target release \
        --file Dockerfile.multi \
        --tag tensorrt_llm/release:latest \
        ..
[+] Building 3.5s (33/44)                                                                                                                            docker:default
 => [internal] load build definition from Dockerfile.multi                                                                                                     0.0s
 => => transferring dockerfile: 3.98kB                                                                                                                         0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6)                                                                                 0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 14)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 57)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 75)                                                                                0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:24.10-py3                                                                                              2.4s
 => [internal] load .dockerignore                                                                                                                              0.1s
 => => transferring context: 257B                                                                                                                              0.0s
 => [internal] load build context                                                                                                                              0.2s
 => => transferring context: 342.26kB                                                                                                                          0.1s
 => [base 1/1] FROM nvcr.io/nvidia/pytorch:24.10-py3@sha256:36555b43d382425a4281ecfbcb41de2f95fb542ca8e531c5486be10df8026f9d                                   0.0s
 => CACHED [devel  1/16] COPY docker/common/install_base.sh install_base.sh                                                                                    0.0s
 => CACHED [devel  2/16] RUN bash ./install_base.sh && rm install_base.sh                                                                                      0.0s
 => CACHED [devel  3/16] COPY docker/common/install_cmake.sh install_cmake.sh                                                                                  0.0s
 => CACHED [devel  4/16] RUN bash ./install_cmake.sh && rm install_cmake.sh                                                                                    0.0s
 => CACHED [devel  5/16] COPY docker/common/install_ccache.sh install_ccache.sh                                                                                0.0s
 => CACHED [devel  6/16] RUN bash ./install_ccache.sh && rm install_ccache.sh                                                                                  0.0s
 => CACHED [devel  7/16] COPY docker/common/install_cuda_toolkit.sh install_cuda_toolkit.sh                                                                    0.0s
 => CACHED [devel  8/16] RUN bash ./install_cuda_toolkit.sh && rm install_cuda_toolkit.sh                                                                      0.0s
 => CACHED [devel  9/16] COPY docker/common/install_tensorrt.sh install_tensorrt.sh                                                                            0.0s
 => CACHED [devel 10/16] RUN bash ./install_tensorrt.sh     --TRT_VER=${TRT_VER}     --CUDA_VER=${CUDA_VER}     --CUDNN_VER=${CUDNN_VER}     --NCCL_VER=${NCC  0.0s
 => CACHED [devel 11/16] COPY docker/common/install_polygraphy.sh install_polygraphy.sh                                                                        0.0s
 => CACHED [devel 12/16] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh                                                                          0.0s
 => CACHED [devel 13/16] COPY docker/common/install_mpi4py.sh install_mpi4py.sh                                                                                0.0s
 => CACHED [devel 14/16] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh                                                                                  0.0s
 => CACHED [devel 15/16] COPY docker/common/install_pytorch.sh install_pytorch.sh                                                                              0.0s
 => CACHED [devel 16/16] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh                                                                           0.0s
 => CACHED [release  1/13] RUN mkdir -p /root/.cache/pip                                                                                                       0.0s
 => CACHED [release  2/13] WORKDIR /app/tensorrt_llm                                                                                                           0.0s
 => CACHED [wheel  1/10] WORKDIR /src/tensorrt_llm                                                                                                             0.0s
 => CACHED [wheel  2/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  3/10] COPY cpp cpp                                                                                                                          0.0s
 => CACHED [wheel  4/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  5/10] COPY scripts scripts                                                                                                                  0.0s
 => CACHED [wheel  6/10] COPY tensorrt_llm tensorrt_llm                                                                                                        0.0s
 => CACHED [wheel  7/10] COPY 3rdparty 3rdparty                                                                                                                0.0s
 => CACHED [wheel  8/10] COPY .gitmodules setup.py requirements.txt requirements-dev.txt ./                                                                    0.0s
 => CACHED [wheel  9/10] RUN mkdir -p /root/.cache/pip /root/.cache/ccache                                                                                     0.0s
 => ERROR [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --cle  0.6s
------
 > [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks:
0.460 fatal: not a git repository (or any of the parent directories): .git
0.460 Traceback (most recent call last):
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 434, in <module>
0.460     main(**vars(args))
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 107, in main
0.460     build_run('git submodule update --init --recursive')
0.460   File "/usr/lib/python3.10/subprocess.py", line 526, in run
0.469     raise CalledProcessError(retcode, process.args,
0.469 subprocess.CalledProcessError: Command 'git submodule update --init --recursive' returned non-zero exit status 128.
------
Dockerfile.multi:72
--------------------
  71 |     ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks"
  72 | >>> RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache \
  73 | >>>     python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}
  74 |
--------------------
ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1
make: *** [Makefile:64: release_build] Error 1
make: Leaving directory '/data/xiedong/TensorRT-LLM/docker'

Expected behavior

#

actual behavior

#

additional notes

#

hello-11 commented 3 hours ago

@xddun You can follow this guide.

xddun commented 3 hours ago

I follow this page, it works:

https://www.dong-blog.fun/post/1863

By the way, I have an additional question. I noticed that the interface for accessing this Triton deployment is quite stiff. Is there a question-and-answer interface similar to OpenAI's available?

When I access it this way, the model's responses seem to be completing my sentences rather than the usual question-and-answer format.

# curl -X POST http://101.136.8.66:8000/v2/models/ensemble/generate -d '{"text_input": "Who are you?", "max_tokens": 200, "bad_words": "", "stop_words": ""}'

{"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_index":0,"sequence_start":false,"text_output":"Who are you? Where do you come from? Where are you going? These are the questions that philosophers ponder. For businesses, these three questions are equally important. Where a business comes from determines its genes; where a business is going determines its strategy; and who a business is determines its culture. Corporate culture is the soul of a business and its intrinsic driving force for development. Corporate culture is the sum of the business's values, spirit, system, and code of conduct, and it forms a unique, stable, and distinctive corporate culture system over the course of long-term development.\nCorporate culture is the soul of a business and its intrinsic driving force for development. Corporate culture is the sum of the business's values, spirit, system, and code of conduct, and it forms a unique, stable, and distinctive corporate culture system over the course of long-term development.\nCorporate culture is the intrinsic driving force for a business's development. Corporate culture is the sum of the business's values, spirit, system, and code of conduct, and it forms a unique, stable, and distinctive corporate culture system over the course of long-term development. Corporate culture"}