TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
The trtllm-build entrypoint ignores the user-provided --model_cls_file and --model_cls_name. It looks like although the model class is resolved and imported, an appropriate entry in the MODEL_MAP isn't made, as seen here.
I've verified that adding the line MODEL_MAP[args.model_cls_name] = model_cls immediately after the line referenced above works well to fix this issue.
Who can help?
@ncomly-nvidia , @byshiue
Information
[x] The official example scripts
[ ] My own modified scripts
Tasks
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[x] My own task or dataset (give details below)
Reproduction
Try a trtllm-build command with any model architecture that is not supported by default, by using the --model_cls_file and --model_cls_name, you'll end up with a KeyError.
Expected behavior
The trtllm-build picks up my model classes and runs to completion and produces the engine files.
The
trtllm-build
entrypoint ignores the user-provided--model_cls_file
and--model_cls_name
. It looks like although the model class is resolved and imported, an appropriate entry in theMODEL_MAP
isn't made, as seen here.I've verified that adding the line
MODEL_MAP[args.model_cls_name] = model_cls
immediately after the line referenced above works well to fix this issue.Who can help?
@ncomly-nvidia , @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Try a
trtllm-build
command with any model architecture that is not supported by default, by using the--model_cls_file
and--model_cls_name
, you'll end up with aKeyError
.Expected behavior
The
trtllm-build
picks up my model classes and runs to completion and produces the engine files.actual behavior
Raises a
KeyError
additional notes
None, fix proposed in problem description.