MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.53k stars 243 forks source link

KeyError with diarize_parallel.py #86

Closed Alam00086 closed 9 months ago

Alam00086 commented 10 months ago

Hi,

Thank you so much for this awesome work.

I am having this configuration: ubuntu: 20.04, CUDA: 11.4, NVIDIA Quardro RTX 6000 24GB VRAM, torch: 2.0.1, whisperc: 3.1.1, nemo_toolkit: 1.20.0, python 3.10.12

I am running diarize_parallel.py on some audio files. I am able to run it on some small sized chunks (10-20 second).

When I run on longer audio files (3-4 min or 10-20 min), I am getting word_timestamps with some extra word (2-3 extra words) with no start, end or score. That is why I am getting key error in helpers.py line number 117:

def get_words_speaker_mapping(wrd_ts, spk_ts, word_anchor_option="start"):
    s, e, sp = spk_ts[0]
    wrd_pos, turn_idx = 0, 0
    wrd_spk_mapping = []
    for wrd_dict in wrd_ts:
        ws, we, wrd = (
            **int(wrd_dict['start'] * 1000),**
            int(wrd_dict['end'] * 1000),
            wrd_dict['word'],
        )

  File "/home/Nasim/whisper-diarization-main/diarize_parallel.py", line 119, in <module>
    wsm = get_words_speaker_mapping(word_timestamps, speaker_ts, "start")
  File "/home/Nasim/whisper-diarization-main/helpers.py", line 117, in get_words_speaker_mapping
    int(wrd_dict['start'] * 1000),
KeyError: 'start'

I think this problem is with wav2vec2 alignment and creating word_timestamps.

Please check this and help me to solve this issue.

Thanks

Alam00086 commented 10 months ago

Hi I put try and exception on entries in the word_timestamps in helpers.py and able to skip those numerals having no start and end time in alignment output.

This might be the issue with with wav2vec2 model while alignment of per word timestamp..

One more thing I used whisperx model while transcribing the audio (on batch size=64) and It has decreased the latency almost half than the faster whisper.

I am facing above issue in faster whisper transcription and alignment as well as whisperx.

If any one help me on this?

Thanks

MahmoudAshraf97 commented 10 months ago

make sure that you are using the exact whisperx version mentioned in the requirements

MahmoudAshraf97 commented 9 months ago

Problem Fixed in latest commit