TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
[10/29/2024-17:08:19] [TRT-LLM] [W] Found pynvml==11.5.3 and cuda driver version 470.182.03. Please use pynvml>=11.5.0 and cuda driver>=526 to get accurate memory usage.
[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
229it [00:02, 93.96it/s]
Traceback (most recent call last):
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 303, in <module>
main()
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 295, in main
convert_and_save_hf(args)
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 251, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 258, in execute
f(args, rank)
File "/home/tensorrtllm_backend/tensorrt_llm/examples/qwen/convert_checkpoint.py", line 241, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 427, in from_hugging_face
loader.generate_tllm_weights(model)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 357, in generate_tllm_weights
self.load(tllm_key,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 278, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 391, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.__del__ at 0x7f778f992050>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 453, in __del__
self.release()
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 450, in release
release_gc()
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 471, in release_gc
torch.cuda.ipc_collect()
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 901, in ipc_collect
_lazy_init()
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 330, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I execute convert script in docker:
command:
model file:
exception stack:
Expected behavior
convert success
actual behavior
convert failed
additional notes
tensorRT-llm 0.13.0