I was running NeMo on a 1 hour wav file, with stemming turned on(demcs). whisper and alignment runs fine but when it enters diarization, I encounter the below error.
The same file runs fine end-to-end when demucs is turned off.
vad: 0%| | 0/99 [00:00<?, ?it/s]
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/nemo/collections/asr/data/audio_to_label.py", line 443, in vad_frame_seq_collate_fn
return _vad_frame_seq_collate_fn(self, batch)
File "/home/rashib/anaconda3/envs/whisperx/lib/python3.8/site-packages/nemo/collections/asr/data/audio_to_label.py", line 184, in _vad_frame_seq_collate_fn
sig = torch.cat((start, sig, end))
RuntimeError: Tensors must have same number of dimensions: got 1 and 2
I was running NeMo on a 1 hour wav file, with stemming turned on(demcs). whisper and alignment runs fine but when it enters diarization, I encounter the below error. The same file runs fine end-to-end when demucs is turned off.