Blair-Johnson / batch-whisper

Batch Support for OpenAI Whisper
MIT License
88 stars 23 forks source link

IndexError thrown when using batch transcribe function. #5

Open winterbulletvoyage opened 1 year ago

winterbulletvoyage commented 1 year ago

Providing list of 4 English audio files (each about 3 hours 45 minutes) to batch transcribe. Consistent error being thrown below with multiple different files.

Traceback (most recent call last): File "C:\envs\iw-analytics\parse_audio.py", line 129, in segments_df, transcript_df = transcribe_audio(model, audio_list) File "C:\envs\iw-analytics\parse_audio.py", line 63, in transcribe_audio results = model.transcribe(audio_file_list, File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 75, in transcribe return batch_transcribe(model=model, File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 474, in batch_transcribe results: List[DecodingResult] = decode_with_fallback(torch.stack(batch_segments)) File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 382, in decode_with_fallback decode_result = model.decode(segment, options) File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 860, in decode result = DecodingTask(model, options).run(mel) File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 772, in run tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens) File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 692, in _main_loop probs_at_sot.append(logits[:, self.sot_index[i]].float().softmax(dim=-1)) IndexError: index 224 is out of bounds for dimension 1 with size 3

Blair-Johnson commented 1 year ago

Could you share the script you're using that produces the error?

tz-rrze commented 1 year ago

I see the same IndexError from time to time. Usually, I'm processing 4 files as one batch. If an error occurs, that's reproducible for that set of files. If I only process one file at a time, everything is fine - still using the same batch-whisper version which was checked out from Github on 2023-01-17. Switching from medium to large-v2 usually also runs without error for the very same 4 files as one batch. Thus, it's a combination of input files and models. Unfortunately, I did not find a set of files yet which I can easily share.

My script to reproduce the error for certain file sets is trivial; input is either German or English and will be auto-detected.

#!/apps/whisper/envs/20230117/bin/python3.9
import sys
import batchwhisper
import batchwhisper.utils

MODEL = "medium"
all_files = sys.argv[1:]

model = batchwhisper.load_model(MODEL)
results= model.transcribe(all_files)

for r in results:
       print("----------")
       print(r['text'])
Blair-Johnson commented 1 year ago

This appears to be the same issue as https://github.com/Blair-Johnson/batch-whisper/issues/9. There's currently a bug in the way that temperature fallback logic is handled for batched cases, which we think is the cause of this issue. It would make sense that switching to a larger model reduces the frequency of the issue, because the larger models produce better / less repetitive predictions for borderline/difficult transcriptions. It also explains why the specific files matter.

Martok88 commented 1 year ago

I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.

Blair-Johnson commented 1 year ago

I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.

Fixing the temperature fallback process is on the road-map, but I won't have a fix for another month or two. If you really need this tool in the mean time, I would suggest using the OpenAI Whisper API. It costs ~$1.80 for 50 hours of audio.

Martok88 commented 1 year ago

I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.

Fixing the temperature fallback process is on the road-map, but I won't have a fix for another month or two. If you really need this tool in the mean time, I would suggest using the OpenAI Whisper API. It costs ~$1.80 for 50 hours of audio.

I considered that, but most of the files I'm working with are larger than 25 MB, and I'm not aware of a way to split them automatically without splitting sentences into multiple files, which would degrade the transcription quality.