Open formater opened 1 day ago
I'm aware that this error exists but I had no luck in reproducing it, can you write the exact steps to reproduce and upload the audio file?
Yes. The sample python code that generates the issue:
import torch
from faster_whisper import WhisperModel
asr_model = WhisperModel("large-v3-turbo", device="cuda", compute_type="int8", download_root="./models")
segments, _ = asr_model.transcribe('test.wav', language='fr', condition_on_previous_text=False, initial_prompt='Free', task='transcribe', word_timestamps=True, suppress_tokens=[-1, 12], beam_size=5)
segments = list(segments) # The transcription will actually run here.
And the audio sample is attached. test.zip
I was not able to reproduce it on my machine or using colab
Maybe python version, debian, pytorch... or something is slightly different on our setups. Anything I can do on my side to get more debug logs to see what is the issue?
are you using the master branch?
median_max_durations
is initialized as an empty list, and since you are using sequential transcription, it will have a single value, The only reason that causes this error is that it is still an empty list which means the for loop in line 1565 was never executed, this will happen when alignments
is an empty list, you need to figure why is this happening
Hi, I found a rare condition, with a specific wav file, specific language and prompt, when I try to transcribe with word_timestamps=True, there is a list index out of range error in add_word_timestamps function:
It seems in the median_max_durations list we have less elements than in the segments list.
I'm using large-v3-turbo model with these transcibe settings:
As I see, the median_max_durations is populated from alignments, so something is maybe wrong there? If i change language or prompt, or use another sound file, then there is no issue.
Thank you