Open eurus-ch opened 8 months ago
Please ensure that you build and run TensorRT-LLM in the same environment. Alternatively, you can try building TensorRT-LLM in a Docker container by executing this command:
make -C docker release_build
Thank you!
Using tensorrt-llm 0.6.1, and the error changes into this
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
func = self.__getitem__(name)
File "/usr/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetMemoryInfo_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/TensorRT-LLM/./examples/llama/build.py", line 906, in <module>
build(0, args)
File "/TensorRT-LLM/./examples/llama/build.py", line 850, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/TensorRT-LLM/./examples/llama/build.py", line 609, in build_rank_engine
profiler.print_memory_usage(f'Rank {rank} Engine build starts')
File "/TensorRT-LLM/tensorrt_llm/profiler.py", line 197, in print_memory_usage
alloc_device_mem, _, _ = device_memory_info(device=device)
File "/TensorRT-LLM/tensorrt_llm/profiler.py", line 148, in device_memory_info
mem_info = _device_get_memory_info_fn(handle)
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 2438, in nvmlDeviceGetMemoryInfo
fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2")
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
Thank you, but I'm developing in a Docker and building another Docker within seems restrained so...
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9typeinfoS6 FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.
=================================================================== I solved this error by manually installing pytorch 2.1.0 Command like this: pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
I too faced this issue. This was the fix: https://github.com/NVIDIA/k8s-device-plugin/issues/331#issuecomment-1859143566
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9typeinfoS6 FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.
=================================================================== I solved this error by manually installing pytorch 2.1.0 Command like this: pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
Thanks
Im still facing the same issue
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9typeinfoS6 FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.
=================================================================== I solved this error by manually installing pytorch 2.1.0 Command like this: pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
Im still getting the same error my env configs are attrs 23.2.0 av 10.0.0 bcrypt 4.1.2 braceexpand 0.1.7 certifi 2020.6.20 cffi 1.16.0 chardet 4.0.0 charset-normalizer 3.3.2 coloredlogs 15.0.1 cryptography 42.0.5 ctranslate2 3.24.0 dbus-python 1.2.16 distro 1.9.0 distro-info 1.0+deb11u1 docker 7.0.0 docker-compose 1.29.2 dockerpty 0.4.1 docopt 0.6.2 einops 0.7.0 encodec 0.1.1 fastcore 1.5.29 faster-whisper 0.9.0 fastprogress 1.0.3 ffmpeg-python 0.2.0 filelock 3.13.3 flatbuffers 24.3.25 fsspec 2024.3.1 future 1.0.0 httplib2 0.18.1 huggingface-hub 0.17.3 humanfriendly 10.0 HyperPyYAML 1.2.2 idna 2.10 Jinja2 3.1.3 joblib 1.3.2 jsonschema 3.2.0 kaldialign 0.9.1 llvmlite 0.42.0 MarkupSafe 2.1.5 more-itertools 10.2.0 mpmath 1.3.0 networkx 3.2.1 numba 0.59.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.4.99 nvidia-nvtx-cu12 12.1.105 nvidia-pyindex 1.0.9 onnxruntime 1.16.0 openai-whisper 20231117 packaging 24.0 paramiko 3.4.0 pillow 10.2.0 pip 20.3.4 protobuf 5.26.1 pycparser 2.22 pycurl 7.43.0.6 PyGObject 3.38.0 PyNaCl 1.5.0 pyrsistent 0.20.0 PySimpleSOAP 1.16.2 python-apt 2.2.1 python-debian 0.1.39 python-debianbts 3.1.0 python-dotenv 0.21.1 python-snappy 0.5.3 PyYAML 5.4.1 regex 2023.12.25 reportbug 7.10.3+deb11u1 requests 2.31.0 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 scipy 1.12.0 sentencepiece 0.2.0 setuptools 52.0.0 six 1.16.0 soundfile 0.12.1 speechbrain 0.5.16 sympy 1.12 tensorrt 8.6.1.post1 tensorrt-bindings 8.6.1 tensorrt-libs 8.6.1 texttable 1.7.0 tiktoken 0.3.3 tokenizers 0.14.1 torch 2.1.0+cu121 torchaudio 2.1.0+cu121 torchvision 0.16.0+cu121 tqdm 4.66.2 triton 2.1.0 typing-extensions 4.10.0 unattended-upgrades 0.1 urllib3 1.26.5 vocos 0.1.0 websocket-client 0.59.0 websockets 12.0 wheel 0.34.2 WhisperSpeech 0.8
development hardware: google cloud
Error message::
FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 58, in _init
torch.classes.load_library(ft_decoder_lib)
File "/usr/local/lib/python3.10/dist-packages/torch/_classes.py", line 51, in load_library
torch.ops.load_library(path)
File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 933, in load_library
ctypes.CDLL(path)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/WhisperFusion/main.py", line 11, in <module>
from whisper_live.trt_server import TranscriptionServer
File "/root/WhisperFusion/whisper_live/trt_server.py", line 17, in <module>
from whisper_live.trt_transcriber import WhisperTRTLLM
File "/root/WhisperFusion/whisper_live/trt_transcriber.py", line 16, in <module>
import tensorrt_llm
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 64, in <module>
_init(log_level="error")
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_common.py", line 61, in _init
raise ImportError(str(e) + msg)
ImportError: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev
FATAL: Decoding operators failed to load. This may be caused by the incompatibility between PyTorch and TensorRT-LLM. Please rebuild and install TensorRT-LLM.
Hi,
while trying to run this
we run into this FATAL ERROR, a strange undefined symbol
Before that, we built wheel through
And the software versions are
Have you got any clue on solving this? Much thanks!