Open gloritygithub11 opened 6 months ago
It is caused by the mismatch of TRT version. Have you rebuild the docker image when you upgrade the TensorRT-LLM to 0.10.0? Since TensorRT-LLM 0.10.0 uses TensorRT 10 while older TensorRT-LLM uses TensorRT 9.
@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1
ll /usr/local/tensorrt/lib/
total 3.5G
lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1
lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1
lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1
lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1
-rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a
lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1
lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1
-rwxr-xr-x 1 root root 33M Apr 15 23:22 libnvinfer_lean.so.10.0.1
-rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a
lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1
lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1
-rwxr-xr-x 1 root root 33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1
-rw-r--r-- 1 root root 37M Apr 15 23:26 libnvinfer_plugin_static.a
-rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a
lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1
lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1
-rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a
lrwxrwxrwx 1 root root 21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10
lrwxrwxrwx 1 root root 25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1
-rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1
-rw-r--r-- 1 root root 19M Apr 15 23:22 libnvonnxparser_static.a
-rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a
drwxr-xr-x 2 root root 168 Apr 15 23:26 stubs
@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1
ll /usr/local/tensorrt/lib/ total 3.5G lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1 lrwxrwxrwx 1 root root 20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1 -rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1 -rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1 lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1 lrwxrwxrwx 1 root root 29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1 -rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1 -rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1 lrwxrwxrwx 1 root root 25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1 -rwxr-xr-x 1 root root 33M Apr 15 23:22 libnvinfer_lean.so.10.0.1 -rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1 lrwxrwxrwx 1 root root 27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1 -rwxr-xr-x 1 root root 33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1 -rw-r--r-- 1 root root 37M Apr 15 23:26 libnvinfer_plugin_static.a -rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1 lrwxrwxrwx 1 root root 30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1 -rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1 -rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a lrwxrwxrwx 1 root root 21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10 lrwxrwxrwx 1 root root 25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1 -rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1 -rw-r--r-- 1 root root 19M Apr 15 23:22 libnvonnxparser_static.a -rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a drwxr-xr-x 2 root root 168 Apr 15 23:26 stubs
How do you build the docker image and the tensorrt_llm?
with following docker file
# Use an official NVIDIA CUDA image as a parent image
FROM nvidia/cuda:12.4.1-devel-ubuntu20.04
# Set the working directory
WORKDIR /app
# Install software-properties-common to add repositories
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y software-properties-common
# Add deadsnakes PPA for newer Python versions
RUN add-apt-repository ppa:deadsnakes/ppa
# Install necessary packages
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
python3.10 \
python3.10-distutils \
python3-pip \
openmpi-bin \
libopenmpi-dev \
git \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update \
&& apt-get install python3.10-venv \
&& python3.10 -m venv venv_dev
RUN apt-get update \
&& apt-get install -y python3.10-dev
RUN . venv_dev/bin/activate \
&& python3 -m pip install -U pip \
&& pip3 install tensorrt_llm --pre --extra-index-url https://pypi.nvidia.com --timeout 3600
RUN apt-get install wget \
&& wget https://github.com/Kitware/CMake/releases/download/v3.29.2/cmake-3.29.2-linux-x86_64.sh\
&& chmod +x cmake-3.29.2-linux-x86_64.sh\
&& ./cmake-3.29.2-linux-x86_64.sh --skip-license --prefix=/usr/local
RUN git clone https://github.com/NVIDIA/TensorRT-LLM.git tensorrt-llm \
&& cd tensorrt-llm \
&& ENV=/root/.bashrc bash docker/common/install_tensorrt.sh
RUN apt-get install -y vim git-lfs
RUN export PYTHONPATH=/app/tensorrt-llm/3rdparty/cutlass/python:$PYTHONPATH \
&& . /app/venv_dev/bin/activate \
&& cd tensorrt-llm \
&& git lfs install \
&& git lfs pull \
&& python scripts/build_wheel.py -c -D"TRT_INCLUDE_DIR=/usr/local/tensorrt/include" -D"TRT_LIB_DIR=/usr/local/tensorrt/lib"
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["bash", "echo Hello World!"]
It seems you don't use the official docker file. Could you take a try?
@byshiue
I follow the steps in https://nvidia.github.io/TensorRT-LLM/installation/linux.html to create a new docker env.
Get similar error.
Process 0 loading engine from /root/models/tmp/trt_engines/Meta-Llama-3-8B-Instruct/fp16/1-gpu-tp1/rank0.engine
[05/24/2024-08:20:11] [TRT] [I] Loaded engine size: 15323 MiB
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Following is the modules related to tensorrt
root@e8fbc031fb35:~/TensorRT-LLM/examples/llama# pip list | grep tensorrt
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.11.0.dev2024052100
PS, I didn't find tensorrt under /usr/local/tensorrt/lib, does it located in somewhere or need additional steps?
Could you take a try following the guide here https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation/build-from-source-linux.md#option-1-build-tensorrt-llm-in-one-step?
I build a new docker image with:
make release_build CUDA_ARCHS="80-real"
the image could build and I can use this image to convert & build with following command:
python ../llama/convert_checkpoint.py --model_dir /mnt/memory/Meta-Llama-3-8B-Instruct --output_dir /mnt/memory/tmp/trt_models/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp --dtype float16 --use_weight_only --weight_only_precision int4 --load_model_on_cpu
trtllm-build \
--checkpoint_dir /mnt/memory/tmp/trt_models/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp \
--output_dir /mnt/memory/tmp/trt_engines/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp \
--gemm_plugin float16 \
--gpt_attention_plugin float16 \
--max_batch_size 1 \
--max_input_len 2048 \
--max_output_len 1024
test load
import tensorrt as trt
# Initialize TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
# Function to load TensorRT engine
def load_engine(engine_path):
with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
rank = 0
# Determine the engine file based on the rank
engine_path = f'/mnt/memory/tmp/trt_engines/Meta-Llama-3-8B-Instruct/w4a16/1-gpu-tp/rank0.engine'
load_engine(engine_path)
get error:
[05/31/2024-00:22:08] [TRT] [I] Loaded engine size: 5342 MiB
[05/31/2024-00:22:09] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[05/31/2024-00:22:09] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[05/31/2024-00:22:09] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Could you share the trt version log of python and c side by
$ pip list | grep tensorrt
tensorrt 10.0.1
tensorrt-llm 0.11.0.dev2024052800
torch-tensorrt 2.3.0a0
$ cat /usr/local/tensorrt/include/NvInferVersion.h | grep version
//! Defines the TensorRT version
#define NV_TENSORRT_MAJOR 10 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 0 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.
#define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version.
#define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version.
#define NV_TENSORRT_LWS_PATCH 0 //!< TensorRT LWS patch version.
$ cat /usr/local/tensorrt/include/NvInferVersion.h | grep version
//! Defines the TensorRT version
#define NV_TENSORRT_MAJOR 10 //!< TensorRT major version.
#define NV_TENSORRT_MINOR 0 //!< TensorRT minor version.
#define NV_TENSORRT_PATCH 1 //!< TensorRT patch version.
#define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version.
#define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version.
#define NV_TENSORRT_LWS_PATCH 0 //!< TensorRT LWS patch version.
$ pip list | grep tensorrt
tensorrt 10.0.1
tensorrt-llm 0.11.0.dev2024052800
torch-tensorrt 2.3.0a0
Could you add your trt_llm root folder into PYTHONPATH
environment variable and try again?
do you mean /app/tensorrt_llm/
, looks there's no python related content in the folder
ll /app/tensorrt_llm/
total 12
drwxr-xr-x 1 root root 40 May 29 13:01 ./
drwxr-xr-x 1 root root 26 May 29 12:19 ../
-rw-rw-r-- 1 root root 5412 May 29 02:20 README.md
drwxr-xr-x 1 root root 17 Apr 12 08:53 benchmarks/
drwxr-xr-x 3 root root 108 Apr 9 06:19 docs/
drwxrwxrwx 1 root root 4096 May 29 12:05 examples/
drwxr-xr-x 3 root root 26 Apr 9 06:19 include/
lrwxrwxrwx 1 root root 57 May 29 13:01 lib -> /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/
I checkout the source code into /app/tensorrt-llm-src, and try to load, get same error
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:07:58] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:02] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:35] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:08:35] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:08:59] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:00] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src/tensorrt_llm
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py
[06/07/2024-11:09:17] [TRT] [I] Loaded engine size: 5342 MiB
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1
[06/07/2024-11:09:17] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
Did you resolve this?
I mean setting PYTHONPATH=tensorrt_llm_backend/tensorrt_llm
after building the tensorrt_llm in the docker image.
@byshiue what do you mean for tensorrt_llm_backend?
tensorrt_llm_backend
means the root path of the repo you clone from https://github.com/triton-inference-server/tensorrtllm_backend
do you mean
/app/tensorrt_llm/
, looks there's no python related content in the folderll /app/tensorrt_llm/ total 12 drwxr-xr-x 1 root root 40 May 29 13:01 ./ drwxr-xr-x 1 root root 26 May 29 12:19 ../ -rw-rw-r-- 1 root root 5412 May 29 02:20 README.md drwxr-xr-x 1 root root 17 Apr 12 08:53 benchmarks/ drwxr-xr-x 3 root root 108 Apr 9 06:19 docs/ drwxrwxrwx 1 root root 4096 May 29 12:05 examples/ drwxr-xr-x 3 root root 26 Apr 9 06:19 include/ lrwxrwxrwx 1 root root 57 May 29 13:01 lib -> /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/
I checkout the source code into /app/tensorrt-llm-src, and try to load, get same error
root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:07:58] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:02] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:02] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt_llm root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:08:35] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:35] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:08:35] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:08:59] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:00] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:00] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) root@tensorrt-llm-build-xxd-03-lmz92:/app# export PYTHONPATH=/app/tensorrt-llm-src/tensorrt_llm root@tensorrt-llm-build-xxd-03-lmz92:/app# python3 test.py [06/07/2024-11:09:17] [TRT] [I] Loaded engine size: 5342 MiB [06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:17] [TRT] [E] 3: getPluginCreator could not find plugin: WeightOnlyQuantMatmultensorrt_llm version: 1 [06/07/2024-11:09:17] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
@byshiue As my above reply, the /app/tensorrt_llm/ do not contains full repo content, and I checked out code into /app/tensorrt-llm-src, I tried both path, all get the same error.
Could you try using the this docker image nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
directly?
Hi @gloritygithub11 do u still have further issue or question now? If not, we'll close it soon.
System Info
tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.0.dev2024050700
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
build with following script, it could build success
load the engine
get error:
Expected behavior
The engine could load success
actual behavior
fail to load engine
additional notes
plugin dir
list plugins with script
and get following plugin list