Closed vladgrand2 closed 7 months ago
I'm still trying to make it work better for russian language and noticed that changing Vad parametrs to:
config.diarizer.vad.parameters.onset = 0.3
config.diarizer.vad.parameters.offset = 0.1
config.diarizer.vad.parameters.pad_offset = -0.1
will do process of transcribing better, but stil not ideal.
Even though I pass a key from the script to whisperx to force language selection, I still get words from other languages in the text. What the original whisperx doesn't do. How can this be overcome?
different performance is expected when using different diarization methods, and it's dependent on language so you should use the one that gives you the best results, we use the exact same transcription and alignment as whisperX, the only different step is the diarization
Do you think that Nemo does this for a bunch of english words and words in other languages when transcribing the russian language?
Your latest version is simply great. It solved all the problems I encountered. It works just like magic. And the diarization from NeMo just made pyannote. Thank you very much.
first of all I add to your script --language choose from whisperx
But I wanted to talk about aligment. Whisperx and your script do different alignment using the same method from whisperx. Perhaps the integration into the script is not entirely complete. But I still can’t figure out how to fix this.
Whisperx always gives the same result from 1 file. diarize.py always gives different results from 1 file and always due to aligment
I also noticed that diarize.py always cuts off the beginning of the file. Transcription of speakers starts from 8-11 seconds, and the first seconds are not transcribed.
I am attaching examples of 1 file with whisperx and diarize.py file1_pyannonote.docx file1_nemo.docx