Closed hsnfirooz closed 6 months ago
The end product is great :) I'm just a bit confused about the Whisper/wav2vec/Pyannote dynamics: so we are still using Pyannote to do the diarization but additionally wav2vec to align Whisper transcription to words? Also, it seems that the doc strings are now out of date, we should update them before merging.
The end product is great :) I'm just a bit confused about the Whisper/wav2vec/Pyannote dynamics: so we are still using Pyannote to do the diarization but additionally wav2vec to align Whisper transcription to words? Also, it seems that the doc strings are now out of date, we should update them before merging.
You are correct about the dynamics :) Looks more confusing, but works great.
Thanks a lot for the cleanup.
Switched to word-based diarization instead of segment based. We use fine-tuned wav2vec models to perform alignment and diarization; as some of the languages don't have a fine-tuned wav2vec model, the following languages are not supported anymore: