NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.5k stars 964 forks source link

Wrong output when input is packed in Whisper with C++ runtime #2272

Open sasikr2 opened 3 weeks ago

sasikr2 commented 3 weeks ago

System Info

CPU Architecture: x86_64 GPU: NVIDIA A100-SXM4-40GB

TensorRT-LLM version: 0.14.0.dev2024091700

Who can help?

No response

Information

Tasks

Reproduction

Steps to reproduce:

  1. Build encoder and decoder in same command which is mentioned in repo using trtllm-build trtllm-build --checkpoint_dir /stream_whisper/latest_build_dir/models/trtllm_checkpoint_v12/encoder \ --output_dir /stream_whisper/latest_build_dir/models/whisper_large_v3/encoder \ --input_timing_cache /stream_whisper/latest_build_dir/encoder_whisper.cache \ --moe_plugin disable \ --enable_xqa disable \ --max_batch_size 4 \ --gemm_plugin disable \ --bert_attention_plugin float16 \ --max_input_len 3000 --max_seq_len=3000
  2. Script to run test test_run.txt Audio Sample: 12 sec english file
  3. Padded input: change line number 509, i.e mels, mels_input_len = prepare_inputs(files, input_type="padded")
  4. Packed input: change line number 509, i.e mels, mels_input_len = prepare_inputs(files, input_type="packed")

Expected behavior

Expected output should be: Output: ['So basically what I observed is that word error rate are very high for Chinese language but character error rate seems to be good. Higher amplitude the WR is degrading and']

actual behavior

When passing packed audio segments, output comes to be empty. while it should matched with padded input.

additional notes

Can you check once script, the way of sending packed input. OR it is some issue in c++ binding.

yuekaizhang commented 1 week ago

@sasikr2 Would you mind trying again with Today's (10/15/2024) commit? There are updates under whisper/readme.md about different padding startegy. However, for offcial whisper, you can't remove 30s padding otherwise you would lose accuracy.

yuekaizhang commented 1 week ago

See https://pypi.org/project/tensorrt-llm/0.15.0.dev2024101500/.

sasikr2 commented 1 week ago

Okay, I will try today with updated code.