Open DeekshithaDPrakash opened 2 days ago
+1 Facing the same issue
HI @byshiue can you help with this?
For stable branch, llama 3.2 is supported since release 0.15.
If you want to run test now, you need to use main branch to deploy.
For stable branch, llama 3.2 is supported since release 0.15.
If you want to run test now, you need to use main branch to deploy.
@byshiue Thank you for your response. I truly appreciate your guidance.
I will test the following and update here soon.
I tested with stable branches of both TensorRT-LLM as well as tensorrtllm-backend with varied tensorrt-llm repos:
The error still remains when convert_checkpoint.py is executed.
Command:
python3 ${CONVERT_CHKPT_SCRIPT} --model_dir ${LLAMA_MODEL} --output_dir ${UNIFIED_CKPT_PATH} --dtype float16
Upgraded tensorrt-llm versions within the docker container using: pip install -U tensorrt-llm==version_no
Error:
/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'libpng16.so.16: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
[TensorRT-LLM] TensorRT-LLM version: 0.14.0
Traceback (most recent call last):
File "/opt/tritonserver/TensorRT_LLM_KARI/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 16, in <module>
from tensorrt_llm.models.convert_utils import infer_dtype
ImportError: cannot import name 'infer_dtype' from 'tensorrt_llm.models.convert_utils' (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/convert_utils.py)
Error:
Traceback (most recent call last):
File "/opt/tritonserver/TensorRT_LLM_KARI/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 10, in <module>
import tensorrt_llm
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 32, in <module>
import tensorrt_llm.functional as functional
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 28, in <module>
from . import graph_rewriting as gw
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 12, in <module>
from .network import Network
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py", line 27, in <module>
from tensorrt_llm.module import Module
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 17, in <module>
from ._common import default_net
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 37, in <module>
from ._utils import str_dtype_to_trt
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 31, in <module>
from tensorrt_llm.bindings import GptJsonConfig
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKSs
Error:
[TensorRT-LLM] TensorRT-LLM version: 0.15.0.dev2024102900
Traceback (most recent call last):
File "/opt/tritonserver/TensorRT_LLM/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 16, in <module>
from tensorrt_llm.models.convert_utils import infer_dtype
ImportError: cannot import name 'infer_dtype' from 'tensorrt_llm.models.convert_utils' (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/convert_utils.py)
The clear solution is not yet found!!!
System Info
GPU: A100 Ubuntu: Ubuntu 22.04.4 LTS
Command:
Upgarded verion: 4.45.2 gives the following error,
The same error comes with other TensorRT-LLM versions like 0.14 for tritonserver:24.08-trtllm-python-py3, tritonserver:24.09-trtllm-python-py3, as well as tritonserver:24.10-trtllm-python-py3
I'm now assuming that this is a bug! as there are multiple users facing the same issue : #2467 #2339 #2320
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Steps to reproduce the behavior:
Expected behavior
The convert_checkpoint runs smoothly and creates two files inside the checkpoint folder:
actual behavior
Lower transformer version gives
rope_scaling
error, while a higher version(>=4.45.1) as described by llama3.2 gives CUDA error: torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterableadditional notes
I think there's surely version mismatch