Closed tapansstardog closed 2 weeks ago
Please use nvcr.io/nvidia/pytorch:24.05-py3
instead of nvcr.io/nvidia/tritonserver:24.05-py3
for engine building.
Apologies for typo, I am using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
. I believe I do not need to install
examples/llama/requirements.txt
. Am I right? Because now I am not getting the above error if I skip installation of packages in requirements.txt.
The examples/llama/requirements.txt
ensure that we can run the llama model succesfully.
It means you're not only need to convert the checkpoint but also need to build the engine, run the inference.
I suggest you follow the doc instructions and use the nvcr.io/nvidia/pytorch:24.05-py3 for engine building. Our latest commit(9691e12bce7ae1c126c435a049eb516eb119486c) relies on this image.
Thanks. One quick question:
In convert_checkpoint.py
, I am trying to convert HF model (codellama/CodeLlama-34b-hf
) to checkpoint files. I am passing HF name instead of downloaded safetensor
files.
I am getting :
Traceback (most recent call last):
File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 464, in <module>
main()
File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 456, in main
convert_and_save_hf(args)
File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 371, in convert_and_save_hf
hf_model = preload_model(
File "/tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 326, in preload_model
for f in os.listdir(model_dir)]) and use_safetensors
FileNotFoundError: [Errno 2] No such file or directory: 'codellama/CodeLlama-34b-hf'
Here use_safetensors
is set to True
. Can't I set it to False and directly run this piece of code?
Just before the code, method is returning None in case of use_safetensors=True
and does nothing.
Please try the latest code https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/convert_checkpoint.py instead of outdated version.
Removing torch and tensorrt before installing tensorrt_llm worked for me.
pip uninstall torch
pip uninstall tensorrt
pip install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
Thanks @nv-guomingz. I was able to move forward and tried creating engine files for codellama-34b.
Steps followed:
python convert_checkpoint.py --model_dir codellama/CodeLlama-34b-hf --output_dir chkpoint_files/ --dtype float16
trtllm-build --checkpoint_dir chkpoint_files/ --output_dir engine_files/ --gemm_plugin float16 --gpt_attention_plugin float16 --gemm_plugin float16 --tp_size 8 --pp_size 1 --auto_parallel 8 --remove_input_padding enable --context_fmha enable --max_input_len 4096 --max_output_len 1024 --max_batch_size 8 --paged_kv_cache enable --use_context_fmha_for_generation enable --use_paged_context_fmha enable
Here is what I got:
[06/27/2024-08:38:03] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] CUDA runtime error in cudaMemGetInfo(&free, &total): unknown error (/home/jenkins/agent/workspace/LLM/release-0.10/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/common/cudaUtils.h:319)
1 0x7fd9ebc2829e void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 94
2 0x7fd9ebc5fd4a tensorrt_llm::kernels::FusedMHARunnerV2::FusedMHARunnerV2(tensorrt_llm::kernels::Data_type, bool, int, int, float) + 1546
3 0x7fd9a3d3e594 tensorrt_llm::plugins::GPTAttentionPluginCommon::initialize() + 420
4 0x7fd9a3d69125 tensorrt_llm::plugins::GPTAttentionPlugin* tensorrt_llm::plugins::GPTAttentionPluginCommon::cloneImpl<tensorrt_llm::plugins::GPTAttentionPlugin>() const + 693
5 0x7fdb662c9ac4 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xba1ac4) [0x7fdb662c9ac4]
6 0x7fdb662c06a5 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb986a5) [0x7fdb662c06a5]
7 0x7fdb662c284a /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb9a84a) [0x7fdb662c284a]
8 0x7fdb6624014c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb1814c) [0x7fdb6624014c]
9 0x7fdb66240c22 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xb18c22) [0x7fdb66240c22]
10 0x7fdb6652e4ee /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xe064ee) [0x7fdb6652e4ee]
11 0x7fdb6617f2ac /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa572ac) [0x7fdb6617f2ac]
12 0x7fdb66184501 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa5c501) [0x7fdb66184501]
13 0x7fdb66184f0b /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa5cf0b) [0x7fdb66184f0b]
14 0x7fdb0f6a7458 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa7458) [0x7fdb0f6a7458]
15 0x7fdb0f6458f3 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x458f3) [0x7fdb0f6458f3]
Docker container hanged after this error.
Same step worked for trtllm-build 0.9.0
what's your GPU for building engine?
L40S
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
The issue got resolved with 24.07 trt-llm image
Hi team,
I am trying to build llama engine files using
nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
container and was getting the below error:The tensorrt related libraries were already installed:
Then I ran the below command referring to this issue: https://github.com/NVIDIA/TensorRT-LLM/issues/1791
pip install tensorrt==10.0.1 --force-reinstall
while installing tensorrt I got this error:
torch-tensorrt 2.3.0a0 requires tensorrt<8.7,>=8.6, but you have tensorrt 10.0.1 which is incompatible.
I ignored this as the
tensorrt
was successfully installed.Now, I am getting undefined symbol error:
Any suggestions?