ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

I have successfully built and started docker container for tensorrt_llm and ran the convert_checkpoints.py as well as trtllm_build as follows:

docker run -it --net host --shm-size=4g --name triton_llm --ulimit memlock=-1 --ulimit stack=67108864 --gpus '"device=1"' -v ~/shared_folder/TensorRT:/opt/tritonserver/TensorRT nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
python3 ${CONVERT_CHKPT_SCRIPT} --model_dir ${LLAMA_MODEL} --output_dir ${UNIFIED_CKPT_PATH} --dtype float16
trtllm-build --checkpoint_dir ${UNIFIED_CKPT_PATH} \ --remove_input_padding enable \ --gpt_attention_plugin float16 \ --context_fmha enable \ --gemm_plugin float16 \ --output_dir ${ENGINE_DIR} \ --paged_kv_cache enable \ --max_batch_size 8

Now I was trying to test the engine using run.py from examples directory as: python3 /opt/tritonserver/TensorRT/TensorRT-LLM/examples/run.py --engine_dir=${ENGINE_DIR} --max_output_len 128 --tokenizer_dir /opt/tritonserver/TensorRT/model/llama-2-7b --input_text "What is ML" --streaming --streaming_interval 2 --temperature 0.7 --top_k 3 --top_p 0.9

I am facing 2 issues:

Issue 1: ImportError: cannot import name 'supports_inflight_batching' from 'tensorrt_llm._utils'
- File "/opt/tritonserver/TensorRT_LLM_RB/TensorRT-LLM/examples/run.py", line 25, in from utils import (DEFAULT_HF_MODEL_DIRS, DEFAULT_PROMPT_TEMPLATES, File "/opt/tritonserver/TensorRT_LLM_RB/TensorRT-LLM/examples/utils.py", line 26, in from tensorrt_llm._utils import supports_inflight_batching # noqa ImportError: cannot import name 'supports_inflight_batching' from 'tensorrt_llm._utils' (/usr/local/lib/python3.10/dist- packages/tensorrt_llm/_utils.py)
- I tried to fix this by copying the tensor_llm folder inside the examples folder and it resolved this issue
Issue 2: After fixing issue 1, a new error occured when i ran the run.py again! File "/opt/tritonserver/TensorRT_LLM_RB/TensorRT-LLM/examples/tensorrt_llm/_utils.py", line 31, in from tensorrt_llm.bindings import GptJsonConfig ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

For issue 2, I am not able to find bindings folder too in tensorrt_llm. I am not sure what is wrong! If convert_checkpoint.py from examples directory works fine without causing issue 1, then why is run.py throwing this error?

NVIDIA / TensorRT-LLM

ModuleNotFoundError: No module named 'tensorrt_llm.bindings' #2416