Closed Alam00086 closed 9 months ago
Hi I put try and exception on entries in the word_timestamps in helpers.py and able to skip those numerals having no start and end time in alignment output.
This might be the issue with with wav2vec2 model while alignment of per word timestamp..
One more thing I used whisperx model while transcribing the audio (on batch size=64) and It has decreased the latency almost half than the faster whisper.
I am facing above issue in faster whisper transcription and alignment as well as whisperx.
If any one help me on this?
Thanks
make sure that you are using the exact whisperx version mentioned in the requirements
Problem Fixed in latest commit
Hi,
Thank you so much for this awesome work.
I am having this configuration: ubuntu: 20.04, CUDA: 11.4, NVIDIA Quardro RTX 6000 24GB VRAM, torch: 2.0.1, whisperc: 3.1.1, nemo_toolkit: 1.20.0, python 3.10.12
I am running diarize_parallel.py on some audio files. I am able to run it on some small sized chunks (10-20 second).
When I run on longer audio files (3-4 min or 10-20 min), I am getting word_timestamps with some extra word (2-3 extra words) with no start, end or score. That is why I am getting key error in helpers.py line number 117:
I think this problem is with wav2vec2 alignment and creating word_timestamps.
Please check this and help me to solve this issue.
Thanks