alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.84k stars 1.09k forks source link

How to increase the accuracy of the vosk module in Python? #1369

Open amina1403 opened 1 year ago

amina1403 commented 1 year ago

How to maximize vosk module accuracy? I use the «large» model Thanks

codes: model = Model("vosk-model-fa-0.5") wf = wave.open("audio.wav", "rb")

rec = KaldiRecognizer(model, wf.getframerate())

while True: data = wf.readframes(8000) if len(data) == 0: break if rec.AcceptWaveform(data): a=rec.Result() else: a=rec.PartialResult()

a=rec.Result()

nshmyrev commented 1 year ago

You can provide us 1000 hours of audio data to let us build more accurate model

amina1403 commented 1 year ago

You can provide us 1000 hours of audio data to let us build more accurate model

1000 hours! It would be difficult for me to do something like that, but it's not impossible, because there are so many free audio books and podcasts. I read something that says accuracy increases as ltsm increases, is that true? And how is it? What changes can be made in the codes to increase the accuracy?

What changes would you recommend in general, that would increase accuracy by at least 10%?

nshmyrev commented 1 year ago

We do not need audiobooks, we need real-life data. What is the application you want to build? What particular audio are you going to recognize.

amina1403 commented 1 year ago

We do not need audiobooks, we need real-life data. What is the application you want to build? What particular audio are you going to recognize.

A software that types and delivers an audio file. Of course, I have done this, but the competitors who are in this field are between 10 and 20 percent more accurate.

nshmyrev commented 1 year ago

You can share the data with us to catch up.

amina1403 commented 1 year ago

Changing the following values ​​has an impact on the accuracy of the vosk module?

model.conf: --min-active=200 --max-active=3000 --beam=10.0 --lattice-beam=2.0 --acoustic-scale=1.0 --frame-subsampling-factor=3 --endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10 --endpoint.rule2.min-trailing-silence=0.5 --endpoint.rule3.min-trailing-silence=1.0 --endpoint.rule4.min-trailing-silence=2.0

mfcc.conf: --use-energy=false --num-mel-bins=20 --num-ceps=20 --low-freq=20 --high-freq=7600

nshmyrev commented 1 year ago

Changing the following values ​​has an impact on the accuracy of the vosk module?

Yes

amina1403 commented 1 year ago

Changing the following values ​​has an impact on the accuracy of the vosk module?

Yes

Changing which one will increase the accuracy of the module?