alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.4k stars 1.04k forks source link

Vietnamese language models not working? #1236

Open tdrp opened 1 year ago

tdrp commented 1 year ago

Hi - has anyone gotten the Vietnamese language models to work accurately?

I have tried the models in English, Chinese and French and they all work fine with relatively low error rates.

But for some reason Vietnamese seems completely broken. I tried the same sentences with Google speech recognition and I can get fairly high accuracy.

I also tried to include a grammar to restrict the words but it only returns [unk] for nearly everything, or completely unrelated (but Vietnamese) words. Could something be broken with the model itself?

nshmyrev commented 1 year ago

Please share an audio file, I'll check. Also verify the file format.

Overall, Vietnamese models are not very accurate due to smaller size of training data.

tdrp commented 1 year ago

vietnamese-five-words.m4a.zip

Hi - attached. This is for the small VN model. I get 0% accuracy, even if grammar is on. The words are: dễ thương. kiên nhẫn. cẩn thận. lý do. hài hước.

nshmyrev commented 1 year ago

vosk-transcriber -l vn -i vietnamese-five-words.m4a

result is

dễ lăn kềnh cẩn thận giữ cho hay hứa

It got word thận so not zero accuracy. But needs improvement, yes

tdrp commented 1 year ago

Thanks - the issue is that for example "lý do" and the transcribed "giữ cho" are quite different phonetically. Same with thương and lăn. I tried to restrict it by including a grammar of 100 words and it still picked the wrong ones. So I wonder if there is some other issue with the model.

For reference, I get very high accuracy on the Western European languages, and on Chinese.

The standard SpeechRecognizer library in Android (the Google one) gives me the correct words (or words that are very close).