NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.02k stars 880 forks source link

Can I use tensorrt-llm depoly qwen1.5? #1119

Open hljjjmssyh opened 6 months ago

Tlntin commented 6 months ago

this may you need link

xinbingzhe commented 6 months ago

I think Qwen1.5 model architecture is same as llama. Have you tried it.

Franc-Z commented 4 months ago

For TRTLLM-0.9.0, you can refer to https://github.com/Franc-Z/QWen1.5_TensorRT-LLM