NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
7.37k stars 796 forks source link

change **kwargs to **default_kwargs to enable trust remote code, fast, etc.. #1863

Open levidehaan opened 5 days ago

levidehaan commented 5 days ago

https://github.com/NVIDIA/TensorRT-LLM/blob/9691e12bce7ae1c126c435a049eb516eb119486c/tensorrt_llm/hlapi/tokenizer.py#L63

nv-guomingz commented 5 days ago

@Superjomn Would u please take a look this question?

Superjomn commented 5 days ago

Good suggestion, thanks.

We will support enable_trust_remote_code in the future when we broaden the model coverage from Llama to other models. Currently, you can also pass an external tokenizer into the LLM instance instead.

levidehaan commented 4 days ago

Roger that, i was pointing out that youre not using the extra kwargs, i enabled them and stopped getting errors loading models that needed some of those settings. Might be safe to enable,, or maybe set a flag to enable?