NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.62k stars 979 forks source link

bloom 560M can not build #658

Open Lenan22 opened 11 months ago

Lenan22 commented 11 months ago

python build.py --model_dir ./bloom/560M/ --dtype float16 --use_gemm_plugin float16 --use_gpt_attention_plugin float16 --output_dir ./bloom/560M/trt_engines/fp16/1-gpu/

Open MPI's OFI driver detected multiple equidistant NICs from the current process, but had insufficient information to ensure MPI processes fairly pick a NIC for use. This may negatively impact performance. A more modern PMIx server is necessary to resolve this issue.

[12/14/2023-14:28:39] [TRT-LLM] [I] Serially build TensorRT engines. [12/14/2023-14:28:39] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 166, GPU 649 (MiB) [12/14/2023-14:28:41] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +482, GPU +80, now: CPU 784, GPU 729 (MiB) [12/14/2023-14:28:41] [TRT-LLM] [W] Invalid timing cache, using freshly created one Traceback (most recent call last): File "/home/tiger/.local/lib/python3.10/site-packages/pynvml/nvml.py", line 850, in _nvmlGetFunctionPointer _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name) File "/usr/local/lib/python3.10/ctypes/init.py", line 387, in getattr func = self.getitem(name) File "/usr/local/lib/python3.10/ctypes/init.py", line 392, in getitem func = self._FuncPtr((name_or_ordinal, self)) AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetMemoryInfo_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/mnt/bn/xiaxin/wulenan/workspace/TensorRT-LLM/examples/bloom/build.py", line 556, in build(0, args) File "/mnt/bn/xiaxin/wulenan/workspace/TensorRT-LLM/examples/bloom/build.py", line 524, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/mnt/bn/xiaxin/wulenan/workspace/TensorRT-LLM/examples/bloom/build.py", line 337, in build_rank_engine profiler.print_memory_usage(f'Rank {rank} Engine build starts') File "/home/tiger/.local/lib/python3.10/site-packages/tensorrt_llm/profiler.py", line 197, in print_memory_usage alloc_devicemem, , _ = device_memory_info(device=device) File "/home/tiger/.local/lib/python3.10/site-packages/tensorrt_llm/profiler.py", line 148, in device_memory_info mem_info = _device_get_memory_info_fn(handle) File "/home/tiger/.local/lib/python3.10/site-packages/pynvml/nvml.py", line 2438, in nvmlDeviceGetMemoryInfo fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2") File "/home/tiger/.local/lib/python3.10/site-packages/pynvml/nvml.py", line 853, in _nvmlGetFunctionPointer raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND) pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

jdemouth-nvidia commented 10 months ago

What system are you running on? Which OS?

yz-tang commented 9 months ago

@jdemouth-nvidia I encountered the same problem. I am using the image: nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3。I'am running on x86 ubuntu。

nullxjx commented 9 months ago

@jdemouth-nvidia I encountered the same problem too,I am using the image: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3。

My system info:

# cat /etc/os-release NAME="CentOS Linux" VERSION="8 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="8" PLATFORM_ID="platform:el8" PRETTY_NAME="CentOS Linux 8 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:8" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8" CENTOS_MANTISBT_PROJECT_VERSION="8" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="8"

My GPU info: Wed Jan 24 15:08:06 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 12.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10 On | 00000000:00:08.0 Off | 0 | | 0% 48C P0 61W / 150W | 3926MiB / 22731MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A10 On | 00000000:00:09.0 Off | 0 | | 0% 47C P0 62W / 150W | 3254MiB / 22731MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

nullxjx commented 9 months ago

nvmlDeviceGetMemoryInfo_v2

@yz-tang I fix this by downgrading pynvml from 11.5.0 to 11.4.0 my tensorrt_llm version is 0.8.0.dev2024011601, using pynvml of version 11.4.0 may occur a warning 'Found pynvml==11.4.0. Please use pynvml>=11.5.0 to get accurate memory usage', just ignore it.

yjjiang11 commented 9 months ago

I got same error while using NV Driver 470.199.02 NV Driver 535.54.03 works.