Open esnvidia opened 5 months ago
Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ?
See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first.
@esnvidia
Yes, here's the exact steps I ran:
https://github.com/esnvidia/distil_whisper_hf2_triton
From: Yuekai Zhang @.> Sent: Tuesday, May 21, 2024 4:52:15 AM To: NVIDIA/TensorRT-LLM @.> Cc: Emanuel Scoullos @.>; Mention @.> Subject: Re: [NVIDIA/TensorRT-LLM] 100% WER on distil-whisper/distil-large-v2 (Issue #1620)
Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ?
See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first.
@esnvidiahttps://github.com/esnvidia
— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/TensorRT-LLM/issues/1620#issuecomment-2122109459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATIYP6OIAK77BZJP7QOKJWLZDMDLTAVCNFSM6AAAAABH3JAE3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGEYDSNBVHE. You are receiving this because you were mentioned.Message ID: @.***>
The test step:
python run.py --engine_dir $output_diry --name librispeech_dummy_output --tokenizer_name gpt2 --assets_dir ./assets/ --dataset librispeech_asr --results_dir ./results
Needs a little tweak to the cmd but should be simple for you to figure out.
From: Emanuel Scoullos @.> Sent: Tuesday, May 21, 2024 4:56:07 AM To: NVIDIA/TensorRT-LLM @.>; NVIDIA/TensorRT-LLM @.> Cc: Mention @.> Subject: Re: [NVIDIA/TensorRT-LLM] 100% WER on distil-whisper/distil-large-v2 (Issue #1620)
Yes, here's the exact steps I ran:
https://github.com/esnvidia/distil_whisper_hf2_triton
From: Yuekai Zhang @.> Sent: Tuesday, May 21, 2024 4:52:15 AM To: NVIDIA/TensorRT-LLM @.> Cc: Emanuel Scoullos @.>; Mention @.> Subject: Re: [NVIDIA/TensorRT-LLM] 100% WER on distil-whisper/distil-large-v2 (Issue #1620)
Did you use the file first https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/whisper/distil_whisper/convert_from_distil_whisper.py ?
See https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper, you may need to convert huggingface checkpoint first.
@esnvidiahttps://github.com/esnvidia
— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/TensorRT-LLM/issues/1620#issuecomment-2122109459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATIYP6OIAK77BZJP7QOKJWLZDMDLTAVCNFSM6AAAAABH3JAE3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGEYDSNBVHE. You are receiving this because you were mentioned.Message ID: @.***>
Oh, I see for distill-large-v2, you should use the default multilingual tokenizer rather than gpt2. @esnvidia
Yes, here's the exact steps I ran: https://github.com/esnvidia/distil_whisper_hf2_triton
Also, you are welcome to contribute this triton model_repo for distil whisper to sherpa/triton/whisper if you have some free time.
@yuekaizhang Are you sure it's multilingual? The step in the example shows gpt2:
here is the cmd
python3 run.py --engine_dir $output_dir --dataset hf-internal-testing/librispeech_asr_dummy --name librispeech_dummy_${output_dir} --tokenizer_name gpt2
as well as this step:
# download the gpt2.tiktoken
wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/gpt2.tiktoken
@yuekaizhang confirmed the need for mulitlingual. This needs to be updated in the docs.
@yuekaizhang confirmed the need for mulitlingual. This needs to be updated in the docs.
Updated it. Now users don't need to specify tokenizer_name by themselves.
Awesome, but I still don't see the change reflected in the main branch. I'm looking here: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper#distil-whisper
Is there a PR tied to this?
Also getting 100% WER using the Triton-ASR-Client by the way. Let me know if you want me to file an issue there. I think it simply involves copying the functions from the run.py here since I was able to get the 3% WER with that.
I can contribute to sherpa etc once this works E2E. :)
Is there a PR tied to this?
Yes. I have updated in the gitlab. It will sync to github several days later.
Also getting 100% WER using the Triton-ASR-Client by the way. Let me know if you want me to file an issue there. I think it simply involves copying the functions from the run.py here since I was able to get the 3% WER with that.
https://github.com/k2-fsa/sherpa/tree/master/triton/whisper#benchmark-using-dataset Could you try --whisper-prompt "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>" . If it can't work, you may file a issue under sherpa, and attach more details. I will investigate at there.
I can contribute to sherpa etc once this works E2E. :)
That sounds great. @esnvidia
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
System Info
DGX V100 and DGX A100
Who can help?
@ncomly-nvidia to add more folks.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Followed the whisper example. Got example engines working on A100 80GB and V100-16GB. To save the HF model in
bin
format I did:I had to download the
mel_filters.npz
andgpt2.tiktoken
separately per the directions.Example build and run cmds:
Expected behavior
Not get >100% WER on
librispeech_asr
:)actual behavior
in
errs-librispeech.txt
%WER = 150.73 Errors: 28722 insertions, 3162 deletions, 50714 substitutions, over 54798 reference words (922 correct) Search below for sections starting with PER-UTT DETAILS:, SUBSTITUTIONS:, DELETIONS:, INSERTIONS:, PER-WORD STATS:
in
rtf-librispeech.txt
RTF: 0.0098 total_duration: 19396.121 seconds (5.39 hours) processing time: 189.115 seconds (0.05 hours) batch size: 4 num_beams: 1
additional notes
n/a