TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
No module named 'tensorrt' #1839

Closed tapansstardog closed 2 weeks ago

tapansstardog commented 3 months ago

Hi team,

I am trying to build llama engine files using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 container and was getting the below error:

Traceback (most recent call last):
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 9, in <module>
    import tensorrt_llm
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 32, in <module>
    import tensorrt_llm.functional as functional
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 25, in <module>
    import tensorrt as trt
ModuleNotFoundError: No module named 'tensorrt'

The tensorrt related libraries were already installed:

tensorrt                  10.0.1
tensorrt-cu12             10.1.0
tensorrt-cu12-bindings    10.1.0
tensorrt-cu12-libs        10.1.0
tensorrt-llm              0.10.0

Then I ran the below command referring to this issue: https://github.com/NVIDIA/TensorRT-LLM/issues/1791 pip install tensorrt==10.0.1 --force-reinstall

while installing tensorrt I got this error: torch-tensorrt 2.3.0a0 requires tensorrt<8.7,>=8.6, but you have tensorrt 10.0.1 which is incompatible.

I ignored this as the tensorrt was successfully installed.

Now, I am getting undefined symbol error:

File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 9, in <module>
    import tensorrt_llm
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 32, in <module>
    import tensorrt_llm.functional as functional
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 28, in <module>
    from . import graph_rewriting as gw
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 12, in <module>
    from .network import Network
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py", line 26, in <module>
    from tensorrt_llm.module import Module
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 17, in <module>
    from ._common import default_net
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 31, in <module>
    from ._utils import str_dtype_to_trt
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 29, in <module>
    from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

Any suggestions?

nv-guomingz commented 3 months ago

Please use nvcr.io/nvidia/pytorch:24.05-py3 instead of nvcr.io/nvidia/tritonserver:24.05-py3 for engine building.

tapansstardog commented 3 months ago

Apologies for typo, I am using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3. I believe I do not need to install examples/llama/requirements.txt. Am I right? Because now I am not getting the above error if I skip installation of packages in requirements.txt.

nv-guomingz commented 3 months ago

The examples/llama/requirements.txt ensure that we can run the llama model succesfully. It means you're not only need to convert the checkpoint but also need to build the engine, run the inference.

I suggest you follow the doc instructions and use the nvcr.io/nvidia/pytorch:24.05-py3 for engine building. Our latest commit(9691e12bce7ae1c126c435a049eb516eb119486c) relies on this image.

tapansstardog commented 3 months ago

Thanks. One quick question:

In convert_checkpoint.py, I am trying to convert HF model (codellama/CodeLlama-34b-hf) to checkpoint files. I am passing HF name instead of downloaded safetensor files.

I am getting :

Traceback (most recent call last):
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 464, in <module>
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 456, in main
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 371, in convert_and_save_hf
    hf_model = preload_model(
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 326, in preload_model
    for f in os.listdir(model_dir)]) and use_safetensors
FileNotFoundError: [Errno 2] No such file or directory: 'codellama/CodeLlama-34b-hf'

Here use_safetensors is set to True. Can't I set it to False and directly run this piece of code?


Just before the code, method is returning None in case of use_safetensors=Trueand does nothing.

nv-guomingz commented 3 months ago

Please try the latest code https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/convert_checkpoint.py instead of outdated version.

Ri0S commented 3 months ago

Removing torch and tensorrt before installing tensorrt_llm worked for me.

pip uninstall torch
pip uninstall tensorrt
pip install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
tapansstardog commented 3 months ago

Thanks @nv-guomingz. I was able to move forward and tried creating engine files for codellama-34b.

Steps followed:

python convert_checkpoint.py --model_dir codellama/CodeLlama-34b-hf   --output_dir  chkpoint_files/   --dtype float16

trtllm-build --checkpoint_dir  chkpoint_files/    --output_dir  engine_files/  --gemm_plugin float16 --gpt_attention_plugin float16 --gemm_plugin float16  --tp_size 8 --pp_size 1  --auto_parallel 8 --remove_input_padding enable  --context_fmha enable --max_input_len 4096 --max_output_len 1024 --max_batch_size 8 --paged_kv_cache enable --use_context_fmha_for_generation enable --use_paged_context_fmha enable

Here is what I got:

[06/27/2024-08:38:03] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
  what():  [TensorRT-LLM][ERROR] CUDA runtime error in cudaMemGetInfo(&free, &total): unknown error (/home/jenkins/agent/workspace/LLM/release-0.10/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/common/cudaUtils.h:319)
1       0x7fd9ebc2829e void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 94
2       0x7fd9ebc5fd4a tensorrt_llm::kernels::FusedMHARunnerV2::FusedMHARunnerV2(tensorrt_llm::kernels::Data_type, bool, int, int, float) + 1546
3       0x7fd9a3d3e594 tensorrt_llm::plugins::GPTAttentionPluginCommon::initialize() + 420
4       0x7fd9a3d69125 tensorrt_llm::plugins::GPTAttentionPlugin* tensorrt_llm::plugins::GPTAttentionPluginCommon::cloneImpl<tensorrt_llm::plugins::GPTAttentionPlugin>() const + 693
5       0x7fdb662c9ac4 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xba1ac4) [0x7fdb662c9ac4]
6       0x7fdb662c06a5 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb986a5) [0x7fdb662c06a5]
7       0x7fdb662c284a /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb9a84a) [0x7fdb662c284a]
8       0x7fdb6624014c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb1814c) [0x7fdb6624014c]
9       0x7fdb66240c22 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb18c22) [0x7fdb66240c22]
10      0x7fdb6652e4ee /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xe064ee) [0x7fdb6652e4ee]
11      0x7fdb6617f2ac /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa572ac) [0x7fdb6617f2ac]
12      0x7fdb66184501 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa5c501) [0x7fdb66184501]
13      0x7fdb66184f0b /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa5cf0b) [0x7fdb66184f0b]
14      0x7fdb0f6a7458 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa7458) [0x7fdb0f6a7458]
15      0x7fdb0f6458f3 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x458f3) [0x7fdb0f6458f3]

Docker container hanged after this error.

Same step worked for trtllm-build 0.9.0

nv-guomingz commented 3 months ago

what's your GPU for building engine?

tapansstardog commented 3 months ago


tapansstardog commented 2 weeks ago

The issue got resolved with 24.07 trt-llm image