TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
when I convert chatglm2-6b with A10, there is error as below:
Traceback (most recent call last):
File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 895, in
run_build()
File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 887, in run_build
build(0, args)
File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 827, in build
engine = build_rank_engine(
File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 592, in build_rank_engine
profiler.print_memory_usage(f'Rank {rank} Engine build starts')
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 269, in print_memory_usage
_default_memory_monitor.print_memory_usage(tag=tag, unit=unit)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 226, in print_memory_usage
alloc_devicemem, , _ = self.device_memory_info(device=device)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 175, in device_memory_info
handle = pynvml.nvmlDeviceGetHandleByIndex(index)
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 1651, in nvmlDeviceGetHandleByIndex
c_index = c_uint(index)
TypeError: 'NoneType' object cannot be interpreted as an integer
script is : python build.py -m chatglm2_6b --output_dir /trtModel/
when I convert chatglm2-6b with A10, there is error as below: Traceback (most recent call last): File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 895, in
run_build()
File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 887, in run_build
build(0, args)
File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 827, in build
engine = build_rank_engine(
File "/code/tensorrt-llm/tensorrt-llm/TensorRT-LLM/examples/chatglm/build.py", line 592, in build_rank_engine
profiler.print_memory_usage(f'Rank {rank} Engine build starts')
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 269, in print_memory_usage
_default_memory_monitor.print_memory_usage(tag=tag, unit=unit)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 226, in print_memory_usage
alloc_devicemem, , _ = self.device_memory_info(device=device)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/profiler.py", line 175, in device_memory_info
handle = pynvml.nvmlDeviceGetHandleByIndex(index)
File "/usr/local/lib/python3.10/dist-packages/pynvml/nvml.py", line 1651, in nvmlDeviceGetHandleByIndex
c_index = c_uint(index)
TypeError: 'NoneType' object cannot be interpreted as an integer
script is : python build.py -m chatglm2_6b --output_dir /trtModel/