Open winterbulletvoyage opened 1 year ago
Could you share the script you're using that produces the error?
I see the same IndexError from time to time. Usually, I'm processing 4 files as one batch. If an error occurs, that's reproducible for that set of files. If I only process one file at a time, everything is fine - still using the same batch-whisper version which was checked out from Github on 2023-01-17. Switching from medium to large-v2 usually also runs without error for the very same 4 files as one batch. Thus, it's a combination of input files and models. Unfortunately, I did not find a set of files yet which I can easily share.
My script to reproduce the error for certain file sets is trivial; input is either German or English and will be auto-detected.
#!/apps/whisper/envs/20230117/bin/python3.9
import sys
import batchwhisper
import batchwhisper.utils
MODEL = "medium"
all_files = sys.argv[1:]
model = batchwhisper.load_model(MODEL)
results= model.transcribe(all_files)
for r in results:
print("----------")
print(r['text'])
This appears to be the same issue as https://github.com/Blair-Johnson/batch-whisper/issues/9. There's currently a bug in the way that temperature fallback logic is handled for batched cases, which we think is the cause of this issue. It would make sense that switching to a larger model reduces the frequency of the issue, because the larger models produce better / less repetitive predictions for borderline/difficult transcriptions. It also explains why the specific files matter.
I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.
I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.
Fixing the temperature fallback process is on the road-map, but I won't have a fix for another month or two. If you really need this tool in the mean time, I would suggest using the OpenAI Whisper API. It costs ~$1.80 for 50 hours of audio.
I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.
Fixing the temperature fallback process is on the road-map, but I won't have a fix for another month or two. If you really need this tool in the mean time, I would suggest using the OpenAI Whisper API. It costs ~$1.80 for 50 hours of audio.
I considered that, but most of the files I'm working with are larger than 25 MB, and I'm not aware of a way to split them automatically without splitting sentences into multiple files, which would degrade the transcription quality.
Providing list of 4 English audio files (each about 3 hours 45 minutes) to batch transcribe. Consistent error being thrown below with multiple different files.
Traceback (most recent call last): File "C:\envs\iw-analytics\parse_audio.py", line 129, in
segments_df, transcript_df = transcribe_audio(model, audio_list)
File "C:\envs\iw-analytics\parse_audio.py", line 63, in transcribe_audio
results = model.transcribe(audio_file_list,
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 75, in transcribe
return batch_transcribe(model=model,
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 474, in batch_transcribe
results: List[DecodingResult] = decode_with_fallback(torch.stack(batch_segments))
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 382, in decode_with_fallback
decode_result = model.decode(segment, options)
File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, *kwargs)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 860, in decode
result = DecodingTask(model, options).run(mel)
File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(args, **kwargs)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 772, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 692, in _main_loop
probs_at_sot.append(logits[:, self.sot_index[i]].float().softmax(dim=-1))
IndexError: index 224 is out of bounds for dimension 1 with size 3