Open yspch2022 opened 5 days ago
@yspch2022 Could you use the latest version of Trt-LLM?
@hello-11 I am using the latest triton container nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 and it has tensorRT-LLM version of 0.14.0, but still facing the same above issue with 0.14.0 version
@yspch2022 Could you use the latest version of Trt-LLM?
@hello-11 Yes. First, I use 0.14.0 version and 0.13.0 secondly in case llama3.2-3B. But neither worked. Even the 0.14 version failed with no llama3-8B conversion similar to this link - https://github.com/NVIDIA/TensorRT-LLM/issues/2452
hellow, I failed to covert trt-llm Llama3.2 3B when I tried to run convert_checkpoint.py. (like this link - https://github.com/NVIDIA/TensorRT-LLM/issues/2339) I want to know if Llama3.2 3B model conversion is not supported now. Also, I want to know when the model conversion is supported.
envs windows & ubuntu 22.04 trt-llm 0.13.0
error msg
[TensorRT-LLM] TensorRT-LLM version: 0.13.0 0.13.0 201it [00:08, 24.13it/s] Traceback (most recent call last): File "D:\test_llama\TensorRT-LLM-0.13.0\examples\llama\convert_checkpoint.py", line 503, in
main()
File "D:\test_llama\TensorRT-LLM-0.13.0\examples\llama\convert_checkpoint.py", line 495, in main
convert_and_save_hf(args)
File "D:\test_llama\TensorRT-LLM-0.13.0\examples\llama\convert_checkpoint.py", line 437, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "D:\test_llama\TensorRT-LLM-0.13.0\examples\llama\convert_checkpoint.py", line 444, in execute
f(args, rank)
File "D:\test_llama\TensorRT-LLM-0.13.0\examples\llama\convert_checkpoint.py", line 423, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "D:\envs\test_llama-v4\lib\site-packages\tensorrt_llm\models\llama\model.py", line 358, in from_hugging_face
loader.generate_tllm_weights(model)
File "D:\envs\test_llama-v4\lib\site-packages\tensorrt_llm\models\model_weights_loader.py", line 357, in generate_tllm_weights
self.load(tllm_key,
File "D:\envs\test_llama-v4\lib\site-packages\tensorrt_llm\models\model_weights_loader.py", line 278, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "D:\envs\test_llama-v4\lib\site-packages\tensorrt_llm\layers\linear.py", line 391, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.del at 0x000001C175AF3640>
Traceback (most recent call last):
File "D:\envs\test_llama-v4\lib\site-packages\tensorrt_llm\models\modeling_utils.py", line 453, in del
self.release()
File "D:\envs\test_llama-v4\lib\site-packages\tensorrt_llm\models\modeling_utils.py", line 450, in release
release_gc()
File "D:\envs\test_llama-v4\lib\site-packages\tensorrt_llm_utils.py", line 471, in release_gc
torch.cuda.ipc_collect()
File "D:\envs\test_llama-v4\lib\site-packages\torch\cuda__init.py", line 904, in ipc_collect
_lazy_init()
File "D:\envs\test_llama-v4\lib\site-packages\torch\cuda\init__.py", line 333, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
CUDA call was originally invoked at:
File "D:\test_llama\TensorRT-LLM-0.13.0\examples\llama\convert_checkpoint.py", line 8, in
from transformers import AutoConfig
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\envs\test_llama-v4\lib\site-packages\transformers__init.py", line 26, in
from . import dependency_versions_check
File "", line 1078, in _handle_fromlist
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\envs\test_llama-v4\lib\site-packages\transformers\dependency_versions_check.py", line 16, in
from .utils.versions import require_version, require_version_core
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\envs\test_llama-v4\lib\site-packages\transformers\utils__init.py", line 27, in
from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\envs\test_llama-v4\lib\site-packages\transformers\utils\chat_template_utils.py", line 39, in
from torch import Tensor
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\envs\test_llama-v4\lib\site-packages\torch\ init__.py", line 1694, in
_C._initExtension(_manager_path())
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\envs\test_llama-v4\lib\site-packages\torch\cuda\ init.py", line 1470, in
_lazy_call(_register_triton_kernels)
File "D:\envs\test_llama-v4\lib\site-packages\torch\cuda\ init__.py", line 256, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))