High Level API doesn't work

nivibilla commented 6 months ago

System Info

Nvidia A10GPU, Databricks

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Followed the steps in /examples/llama to build the engine. Inference does work, tested with ../run.py

However using the high level python api doesn't work when initialising from built engine.

Using version [TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024022000

Expected behavior

High level API should load config

actual behavior

does not load.

additional notes

Loading Model: [1/2]    Load TensorRT-LLM engine

KeyError: 'plugin_config'
File <command-475253652531641>, line 1
----> 1 llm = LLM(config)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-9ac07104-33c3-4c8d-8e2d-bcba65bd05eb/lib/python3.10/site-packages/tensorrt_llm/hlapi/llm.py:602, in _ModelInfo.from_builder_config_json(cls, config)
    599 @classmethod
    600 def from_builder_config_json(cls, config: dict):
    601     # The Dict format is { 'builder_config':..., 'plugin_config':...}
--> 602     dtype = config['plugin_config']['gpt_attention_plugin']
    603     return cls(dtype=dtype, architecture=config['builder_config']['name'])

QiJune commented 6 months ago

@Superjomn Could you please have a look? Thanks

Superjomn commented 6 months ago

This is a known issue and has been resolved in the internal codebase, please wait for next Tuesday's release, the bugfix will sync to the main branch.

NVIDIA / TensorRT-LLM