alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.15k stars 1.12k forks source link

Language identification #420

Open traderboy opened 3 years ago

traderboy commented 3 years ago

I have lots of audio files in different languages and I'd like to run them through Vosk to find out which ones contain Russian speakers. I think I can get close by using the Russian model and word level confidences. But running an English audio file with the same Russian model also returns a lot of results. The confidences are lower than using the Russian model, but not enough to be certain.

How can I find the number words in an audio file that are NOT detected? For example, I have an English audio file that returns 60 words when using an English model, but returns 30 words running the same file through the Russian model. It might be useful to know how many words aren't found or have a zero word confidence level. Is that possible? I haven't found anything in the code or examples that do that.

More generally, what's the best way to reasonable determine programmatically that the language is Russian? I'd like to do the same for other languages such as Chinese.

nshmyrev commented 3 years ago

We do not support language identification yet.

nshmyrev commented 3 years ago

You can use something external like

https://github.com/py-lidbox/lidbox

or

http://bark.phon.ioc.ee/voxlingua107/

traderboy commented 3 years ago

You can use something external like

https://github.com/py-lidbox/lidbox

or

http://bark.phon.ioc.ee/voxlingua107/

Thanks, the Voxlingua demo is exactly what I need, unfortunately they don't provide source code and instructions. I'm trying out lidbox, but it's not clear how to create an application to do what I need.

You wrote "We do not support language identification yet." so that's encouraging to know that it may be added to Vosk someday. I've been able to use both the C and Python code with good results so it'd be great to continue using Vosk.

doublex commented 3 years ago

@traderboy https://github.com/snakers4/silero-vad

nshmyrev commented 3 years ago

Thanks, the Voxlingua demo is exactly what I need, unfortunately they don't provide source code and instructions. I'm trying out lidbox, but it's not clear how to create an application to do what I need.

Voxlingua code is here:

https://github.com/alumae/torch-xvectors-wav

also

https://github.com/alumae/voxlingua107_sb

nshmyrev commented 3 years ago

Related issue #233

nshmyrev commented 2 years ago

Also

https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa

and wav2vec based

https://huggingface.co/anton-l/wav2vec2-base-lang-id