Qwen-VL-Chat has an error

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache License 2.0

8.58k stars 974 forks source link

System Info

A100-PCIe-40GB Tensorrt-LLM-verison:0.12.0

Who can help?

@sunnyqgg

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

The effects of using trt-llm-0.12.0 with fp16 and fp32 precision are significantly different from those of Qwen-VL found at https://github.com/QwenLM/Qwen-VL. trt-llm-0.12.0 with fp16：