I just wanted to comment that Vosk is absolutely exceptional. The transcriptions from Whisper were highly inaccurate, but Vosk is very accurate.
Two issues - Whisper also has a function to translate speech. Could Vosk implement this? That would make it's offline capabilities all the more astounding.
Second, there is a female voice in the Russian language that I am trying to transcribe, and the poor woman seems to confound all speech recognition AIs. Even Vosk struggles with her. She has a lower range voice (approximately 165 Hz) and it seems that most AIs aren't used to deeper female voices. Is training the AI to recognize less common vocal types something the Vosk team can look into?
I just wanted to comment that Vosk is absolutely exceptional. The transcriptions from Whisper were highly inaccurate, but Vosk is very accurate.
Two issues - Whisper also has a function to translate speech. Could Vosk implement this? That would make it's offline capabilities all the more astounding.
Second, there is a female voice in the Russian language that I am trying to transcribe, and the poor woman seems to confound all speech recognition AIs. Even Vosk struggles with her. She has a lower range voice (approximately 165 Hz) and it seems that most AIs aren't used to deeper female voices. Is training the AI to recognize less common vocal types something the Vosk team can look into?