alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.57k stars 1.06k forks source link

Does speaker recognition work without acoustic model? #1320

Open fquirin opened 1 year ago

fquirin commented 1 year ago

Hi,

I've been doing some tests with the vosk small en model + speaker recognition and I think results are pretty solid, at least it was able to always find the correct speaker in a set of 7. It is a bit more complicated if you use it for true/false (is this speaker A?) tests but anyway I think it is an interesting feature 👍.

I'd like to use this for a variety of languages, but noticed that it does require a proper acoustic model to work. Now I could probably just keep using the English small model or any other and simply discard the resulting text, but I was wondering if this would lead to any issues with accuracy in other languages? And maybe there is a more efficient, generic acoustic model we could use for speaker recognition? I'm assuming the speaker recognition requires some VAD or feature extraction done by the acoustic model? At least it didn't work when I built a tiny grammar model with just tokens 😅.