Open Saeedmatt3r opened 2 weeks ago
@Saeedmatt3r Thanks for reporting the issue. The fix would be synced to github next week. For a quick fix, you need to modify here https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/run.py#L370.
@yuekaizhang Thanks, to be honest, I've done that actually, but Just wanted to report the issues in 0.14
and 0.15
, also I think the official whisper on trt-llm-backend
is also not working as expected, I used 0.15
for engine creation and it was not working. I will try to create another ticket in the repo.
System Info
GPU:
A10
Base Image:FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
Tensorrt-llm:0.12.0
: It's working, but I can't use it because of a version mismatch in TRT and trt-llm-backend0.13.0
: It's working, but I can't use it because of a version mismatch in TRT and trt-llm-backend0.14.0
: not working:Assertion failed: Must set crossKvCacheFraction for encoder-decoder model
0.15.0.dev2024110500
: not workingWho can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Steps to reproduce the problem:
then by running the official whisper example:
Expected behavior
It should run on the dataset without any problem:
actual behavior
Using the latest available pip package(
0.15.0.dev2024110500
)additional notes
Also I checked the trt-llm-backend for the whisper example and also that was not working, with the following error: