NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.59k stars 975 forks source link

ModuleNotFoundError: No module named 'tensorrt_llm.bindings' #2416

Open DeekshithaDPrakash opened 1 day ago

DeekshithaDPrakash commented 1 day ago

I have successfully built and started docker container for tensorrt_llm and ran the convert_checkpoints.py as well as trtllm_build as follows:

  1. docker run -it --net host --shm-size=4g --name triton_llm --ulimit memlock=-1 --ulimit stack=67108864 --gpus '"device=1"' -v ~/shared_folder/TensorRT:/opt/tritonserver/TensorRT nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3

  2. python3 ${CONVERT_CHKPT_SCRIPT} --model_dir ${LLAMA_MODEL} --output_dir ${UNIFIED_CKPT_PATH} --dtype float16

  3. trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \ --remove_input_padding enable \ --gpt_attention_plugin float16 \ --context_fmha enable \ --gemm_plugin float16 \ --output_dir ${ENGINE_DIR} \ --paged_kv_cache enable \ --max_batch_size 8

Now I was trying to test the engine using run.py from examples directory as: python3 /opt/tritonserver/TensorRT/TensorRT-LLM/examples/run.py --engine_dir=${ENGINE_DIR} --max_output_len 128 --tokenizer_dir /opt/tritonserver/TensorRT/model/llama-2-7b --input_text "What is ML" --streaming --streaming_interval 2 --temperature 0.7 --top_k 3 --top_p 0.9

I am facing 2 issues:

For issue 2, I am not able to find bindings folder too in tensorrt_llm. I am not sure what is wrong! If convert_checkpoint.py from examples directory works fine without causing issue 1, then why is run.py throwing this error?

hello-11 commented 15 hours ago

@DeekshithaDPrakash , you can try to add the trtllm install path to the PATH environment.