NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.17k stars 905 forks source link

High Level API doesn't work #1151

Open nivibilla opened 6 months ago

nivibilla commented 6 months ago

System Info

Nvidia A10GPU, Databricks

Who can help?

No response

Information

Tasks

Reproduction

Followed the steps in /examples/llama to build the engine. Inference does work, tested with ../run.py

However using the high level python api doesn't work when initialising from built engine.

Using version [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024022000

Expected behavior

High level API should load config

actual behavior

does not load.

additional notes

Loading Model: [1/2]    Load TensorRT-LLM engine

KeyError: 'plugin_config'
File <command-475253652531641>, line 1
----> 1 llm = LLM(config)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-9ac07104-33c3-4c8d-8e2d-bcba65bd05eb/lib/python3.10/site-packages/tensorrt_llm/hlapi/llm.py:602, in _ModelInfo.from_builder_config_json(cls, config)
    599 @classmethod
    600 def from_builder_config_json(cls, config: dict):
    601     # The Dict format is { 'builder_config':..., 'plugin_config':...}
--> 602     dtype = config['plugin_config']['gpt_attention_plugin']
    603     return cls(dtype=dtype, architecture=config['builder_config']['name'])
QiJune commented 6 months ago

@Superjomn Could you please have a look? Thanks

Superjomn commented 6 months ago

This is a known issue and has been resolved in the internal codebase, please wait for next Tuesday's release, the bugfix will sync to the main branch.