NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.29k stars 925 forks source link

No module named 'tensorrt' #1839

Closed tapansstardog closed 2 weeks ago

tapansstardog commented 3 months ago

Hi team,

I am trying to build llama engine files using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 container and was getting the below error:

Traceback (most recent call last):
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 9, in <module>
    import tensorrt_llm
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 32, in <module>
    import tensorrt_llm.functional as functional
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 25, in <module>
    import tensorrt as trt
ModuleNotFoundError: No module named 'tensorrt'

The tensorrt related libraries were already installed:

tensorrt                  10.0.1
tensorrt-cu12             10.1.0
tensorrt-cu12-bindings    10.1.0
tensorrt-cu12-libs        10.1.0
tensorrt-llm              0.10.0

Then I ran the below command referring to this issue: https://github.com/NVIDIA/TensorRT-LLM/issues/1791 pip install tensorrt==10.0.1 --force-reinstall

while installing tensorrt I got this error: torch-tensorrt 2.3.0a0 requires tensorrt<8.7,>=8.6, but you have tensorrt 10.0.1 which is incompatible.

I ignored this as the tensorrt was successfully installed.

Now, I am getting undefined symbol error:

File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 9, in <module>
    import tensorrt_llm
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 32, in <module>
    import tensorrt_llm.functional as functional
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 28, in <module>
    from . import graph_rewriting as gw
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 12, in <module>
    from .network import Network
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py", line 26, in <module>
    from tensorrt_llm.module import Module
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 17, in <module>
    from ._common import default_net
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 31, in <module>
    from ._utils import str_dtype_to_trt
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 29, in <module>
    from tensorrt_llm.bindings.BuildInfo import ENABLE_MULTI_DEVICE
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs

Any suggestions?

nv-guomingz commented 3 months ago

Please use nvcr.io/nvidia/pytorch:24.05-py3 instead of nvcr.io/nvidia/tritonserver:24.05-py3 for engine building.

tapansstardog commented 3 months ago

Apologies for typo, I am using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3. I believe I do not need to install examples/llama/requirements.txt. Am I right? Because now I am not getting the above error if I skip installation of packages in requirements.txt.

nv-guomingz commented 3 months ago

The examples/llama/requirements.txt ensure that we can run the llama model succesfully. It means you're not only need to convert the checkpoint but also need to build the engine, run the inference.

I suggest you follow the doc instructions and use the nvcr.io/nvidia/pytorch:24.05-py3 for engine building. Our latest commit(9691e12bce7ae1c126c435a049eb516eb119486c) relies on this image.

tapansstardog commented 3 months ago

Thanks. One quick question:

In convert_checkpoint.py, I am trying to convert HF model (codellama/CodeLlama-34b-hf) to checkpoint files. I am passing HF name instead of downloaded safetensor files.

I am getting :

Traceback (most recent call last):
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 464, in <module>
    main()
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 456, in main
    convert_and_save_hf(args)
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 371, in convert_and_save_hf
    hf_model = preload_model(
  File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 326, in preload_model
    for f in os.listdir(model_dir)]) and use_safetensors
FileNotFoundError: [Errno 2] No such file or directory: 'codellama/CodeLlama-34b-hf'

Here use_safetensors is set to True. Can't I set it to False and directly run this piece of code?

https://github.com/NVIDIA/TensorRT-LLM/blob/v0.10.0/examples/llama/convert_checkpoint.py#L329C3-L329C35

Just before the code, method is returning None in case of use_safetensors=Trueand does nothing.

nv-guomingz commented 3 months ago

Please try the latest code https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/convert_checkpoint.py instead of outdated version.

Ri0S commented 3 months ago

Removing torch and tensorrt before installing tensorrt_llm worked for me.

pip uninstall torch
pip uninstall tensorrt
pip install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
tapansstardog commented 3 months ago

Thanks @nv-guomingz. I was able to move forward and tried creating engine files for codellama-34b.

Steps followed:

python convert_checkpoint.py --model_dir codellama/CodeLlama-34b-hf   --output_dir  chkpoint_files/   --dtype float16

trtllm-build --checkpoint_dir  chkpoint_files/    --output_dir  engine_files/  --gemm_plugin float16 --gpt_attention_plugin float16 --gemm_plugin float16  --tp_size 8 --pp_size 1  --auto_parallel 8 --remove_input_padding enable  --context_fmha enable --max_input_len 4096 --max_output_len 1024 --max_batch_size 8 --paged_kv_cache enable --use_context_fmha_for_generation enable --use_paged_context_fmha enable

Here is what I got:

[06/27/2024-08:38:03] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
  what():  [TensorRT-LLM][ERROR] CUDA runtime error in cudaMemGetInfo(&free, &total): unknown error (/home/jenkins/agent/workspace/LLM/release-0.10/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/common/cudaUtils.h:319)
1       0x7fd9ebc2829e void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 94
2       0x7fd9ebc5fd4a tensorrt_llm::kernels::FusedMHARunnerV2::FusedMHARunnerV2(tensorrt_llm::kernels::Data_type, bool, int, int, float) + 1546
3       0x7fd9a3d3e594 tensorrt_llm::plugins::GPTAttentionPluginCommon::initialize() + 420
4       0x7fd9a3d69125 tensorrt_llm::plugins::GPTAttentionPlugin* tensorrt_llm::plugins::GPTAttentionPluginCommon::cloneImpl<tensorrt_llm::plugins::GPTAttentionPlugin>() const + 693
5       0x7fdb662c9ac4 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xba1ac4) [0x7fdb662c9ac4]
6       0x7fdb662c06a5 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb986a5) [0x7fdb662c06a5]
7       0x7fdb662c284a /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb9a84a) [0x7fdb662c284a]
8       0x7fdb6624014c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb1814c) [0x7fdb6624014c]
9       0x7fdb66240c22 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb18c22) [0x7fdb66240c22]
10      0x7fdb6652e4ee /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xe064ee) [0x7fdb6652e4ee]
11      0x7fdb6617f2ac /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa572ac) [0x7fdb6617f2ac]
12      0x7fdb66184501 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa5c501) [0x7fdb66184501]
13      0x7fdb66184f0b /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa5cf0b) [0x7fdb66184f0b]
14      0x7fdb0f6a7458 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa7458) [0x7fdb0f6a7458]
15      0x7fdb0f6458f3 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x458f3) [0x7fdb0f6458f3]

Docker container hanged after this error.

Same step worked for trtllm-build 0.9.0

nv-guomingz commented 3 months ago

what's your GPU for building engine?

tapansstardog commented 3 months ago

L40S

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

tapansstardog commented 2 weeks ago

The issue got resolved with 24.07 trt-llm image