MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.43k stars 288 forks source link

Error in diarization #160

Open rashi-budati opened 8 months ago

rashi-budati commented 8 months ago

I was running NeMo on a 1 hour wav file, with stemming turned on(demcs). whisper and alignment runs fine but when it enters diarization, I encounter the below error. The same file runs fine end-to-end when demucs is turned off.

vad:   0%|                                                                                                                                                                           | 0/99 [00:00<?, ?it/s]
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
    return self.collate_fn(data)
  File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/nemo/collections/asr/data/audio_to_label.py", line 443, in vad_frame_seq_collate_fn
    return _vad_frame_seq_collate_fn(self, batch)
  File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/nemo/collections/asr/data/audio_to_label.py", line 184, in _vad_frame_seq_collate_fn
    sig = torch.cat((start, sig, end))
RuntimeError: Tensors must have same number of dimensions: got 1 and 2
MahmoudAshraf97 commented 5 days ago

can you upload the file to reproduce?