Open spygaurad opened 1 year ago
Given these results, I believe the fine-tuned model does not output timestamp tokens for some reason.
To confirm that, can you provide the output of the same run after adding -ps
command line argument to make the tool print the special tokens in the output?
@ggerganov This is the script with -ps. One think i noticed is that the runs in my model is 5692 whereas in medium.en model, it is around 92.
I tried inference without timestamp option( -nt ), still taking too long.
I see the transcribe (50359)
token is being decoded a lot of times for some reason. This is not supposed to happen.
I just pushed a change to master
to suppress the task tokens. Not sure if it would help, but you might want to give it another try.
I pulled the master , haven't noticed any performance change.
We still see the 50359
token - this is unexpected.
I guess best option is to provide instructions for downloading the model so I can test it locally.
have same problem here.
after convert my fine-tuned model, it take low time on decode time:
whisper_print_timings: load time = 73.56 ms
whisper_print_timings: fallbacks = 2 p / 1 h
whisper_print_timings: mel time = 33.36 ms
whisper_print_timings: sample time = 928.93 ms / 1907 runs ( 0.49 ms per run)
whisper_print_timings: encode time = 129.43 ms / 1 runs ( 129.43 ms per run)
whisper_print_timings: decode time = 2592.87 ms / 1899 runs ( 1.37 ms per run)
whisper_print_timings: total time = 3770.88 ms
add: -nf, --no-fallback [false ] do not use temperature fallback while decoding now it works
whisper_print_timings: load time = 62.46 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 34.00 ms
whisper_print_timings: sample time = 3.18 ms / 8 runs ( 0.40 ms per run)
whisper_print_timings: encode time = 173.80 ms / 1 runs ( 173.80 ms per run)
whisper_print_timings: decode time = 11.11 ms / 8 runs ( 1.39 ms per run)
whisper_print_timings: total time = 296.14 ms
Question 1: This is a whisper medium model finetuned in Nepali language. The inference for audio of length 39 second takes forever(13 mins). Is there any issues with the ggml conversion? @ggerganov The same audio takes 70 seconds with medium.en model.
Question 2: The transcription output is of 30 second chunk, how to make it dynamic like the ggml-medium model?