TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
I am getting an error when using TensorRT-LLM/examples/gptneox/build.py to build the TensorRT engine:
line 314, in build_rank_engine
assert hf_gpt is not None, f'Could not load weights from hf_gpt model as it is not loaded yet.'
AssertionError: Could not load weights from hf_gpt model as it is not loaded yet.
It seems like hf_gpt is correctly loaded in parse_arguments. However, inside a process (inside the build function), hf_gpt is None.
I am using TensorRT-LLM 0.7.0, and this is the command I am using to build:
I am getting an error when using
TensorRT-LLM/examples/gptneox/build.py
to build the TensorRT engine:It seems like
hf_gpt
is correctly loaded inparse_arguments
. However, inside a process (inside the build function), hf_gpt isNone
.I am using TensorRT-LLM 0.7.0, and this is the command I am using to build: