AaltoRSE / speech2text

Instructions to setup and use Aalto speech2text app on Triton.
Other
1 stars 1 forks source link

Increase diarization performance #18

Closed hsnfirooz closed 6 months ago

hsnfirooz commented 7 months ago

Switched to word-based diarization instead of segment based. We use fine-tuned wav2vec models to perform alignment and diarization; as some of the languages don't have a fine-tuned wav2vec model, the following languages are not supported anymore:

afrikaans
azerbaijani
belarusian
bosnian
croatian
kannada
macedonian
maori
welsh
ruokolt commented 7 months ago

The end product is great :) I'm just a bit confused about the Whisper/wav2vec/Pyannote dynamics: so we are still using Pyannote to do the diarization but additionally wav2vec to align Whisper transcription to words? Also, it seems that the doc strings are now out of date, we should update them before merging.

hsnfirooz commented 7 months ago

The end product is great :) I'm just a bit confused about the Whisper/wav2vec/Pyannote dynamics: so we are still using Pyannote to do the diarization but additionally wav2vec to align Whisper transcription to words? Also, it seems that the doc strings are now out of date, we should update them before merging.

You are correct about the dynamics :) Looks more confusing, but works great.

hsnfirooz commented 7 months ago

Thanks a lot for the cleanup.