Is it possible to detect the spoken language?

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Apache License 2.0

7.36k stars 1.04k forks source link

Is it possible to detect the spoken language? #1541

Closed silvioprog closed 5 days ago

silvioprog commented 3 months ago

Hi.

I have been developing this free transcription website using the model vosk-model-en-us-0.42-gigaspeech, so it should accept only English videos, however, I've noticed some people sending videos in Portuguese, Spanish, Japanese and so on, and I would like to block it.

So, it that possible to detect if the audio (extracted from the video) is really in English language? (Something like whisper.detect_language())

TIA for any help!

nshmyrev commented 3 months ago

There is no problem to use whisper for initial language identification step, you can also use other models like

https://huggingface.co/speechbrain/lang-id-voxlingua107-ecapa

silvioprog commented 3 months ago

@nshmyrev After a couple of tests, I decided to go with speechbrain/lang-id-voxlingua107-ecapa. Thanks a lot for this excellent suggestion!

nshmyrev commented 5 days ago

Tracked in https://github.com/alphacep/vosk-api/issues/420