alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.06k stars 1.11k forks source link

loooong processing time #893

Open squintarelli opened 2 years ago

squintarelli commented 2 years ago

Hello,

I am using vosk models (tried english and italian) with nerd-dictation (https://github.com/ideasman42/nerd-dictation)

after starting recognition, for any short sentence to be recognized, it takes ages before producing any output. (by ages, I mean from 30 to 90 seconds; in the meantime, the computer is frozen). after getting the first output the terminal starts responding again and I'm able to stop it.

I am running Ubuntu 20.04.1 on a i7 .3GHz 8 core CPU with 15.4 GB RAM

any suggestion is highly appreciated..

nshmyrev commented 2 years ago

Which model did you try exactly. You should be using smaller model like https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip

squintarelli commented 2 years ago

thanks I tried both

using vosk-model-en-us-0.22-lgraph.zip delay is reduced to about 15 seconds. still too long to be useful for dictation :-(

nshmyrev commented 2 years ago

Ok, and what is the CPU load during recognition? You can run top in parallel terminal.

squintarelli commented 2 years ago

about 98% CPU and 2.7% MEM

nshmyrev commented 2 years ago

Please share the screenshot

squintarelli commented 2 years ago

the computer gets frozen during speech recognition so top doesn't update. this is a screenshot just before I start talking

nshmyrev commented 2 years ago

Feels like a driver issue, not vosk. Does portaudio work for you at all? Can you record audio with parec command in command line? Maybe xdrtool creates locks.

Does demo python code to recognize a file work quickly for you.

ls-milkyway commented 2 years ago

Install python again....create environment ...then install VOSK in the environment...do not forget to remove previous model files ...before trying a different model.

squintarelli commented 2 years ago

Feels like a driver issue, not vosk. Does portaudio work for you at all? Can you record audio with parec command in command line? Maybe xdrtool creates locks.

Does demo python code to recognize a file work quickly for you.

I don't have portaudio installed. should I ?

nshmyrev commented 2 years ago

I don't have portaudio installed. should I ?

Yes, I believe nerd-dictation uses parec:

https://github.com/ideasman42/nerd-dictation/blob/master/nerd-dictation#L729

squintarelli commented 2 years ago

I installed it, but nothing has changed I'm not at all familiar with these tools how can I try to see if portaudio / parec are working ?

nshmyrev commented 2 years ago

how can I try to see if portaudio / parec are working ?

Run it from the command line and see if it records the audio from the microphone.

squintarelli commented 2 years ago

I tried thw following pacmd list-sources | grep ".monitor" got this output

name: <alsa_output.usb-FongLun_USB_Microphone_201605-00.iec958-stereo.monitor>
    monitor_of: 1
        device.class = "monitor"
    name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_5__sink.monitor>
    monitor_of: 3
        device.class = "monitor"
    name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_4__sink.monitor>
    monitor_of: 4
        device.class = "monitor"
    name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_3__sink.monitor>
    monitor_of: 5
        device.class = "monitor"
    name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp__sink.monitor>
    monitor_of: 6
        device.class = "monitor"
    name: <alsa_output.usb-DisplayLink_Dell_USB3.0_Dock_1509160772-02.analog-stereo.monitor>
    monitor_of: 8
        device.class = "monitor"

then, for each device name, I tried parec -d alsa_output.usb-DisplayLink_Dell_USB3.0_Dock_1509160772-02.analog-stereo.monitor --file-format=wav test.wav a file tes.wav gets created but, opening it with the 'video' application (the default) it produces no sound...

nshmyrev commented 2 years ago

Those are output devices, I doubt you can record from them. You need to find an input device ;)

squintarelli commented 2 years ago

LOL how do I list the input devices ? (I told you I'm not familiar at all with this things...) :-)

nshmyrev commented 2 years ago

probably you can use gui to configure input, it should be more straightforward for you. After that just parec test.wav should work

squintarelli commented 2 years ago

OK, I did. It works. I can record the audio with parec. But still nerd-dictation does not :-(

nshmyrev commented 2 years ago

Ok congratulations! Could you please clarify what happens when you run nerd-dictation?

squintarelli commented 2 years ago

before installing pulseaudio, recognition was really slow after installing pulseaudio, recognition hangs the computer; I have to force power-off. :-(

nshmyrev commented 2 years ago

Ok, that seems to be a problem with xdotool. You can comment out xdotool inside nerd-dictation and see if it works.

You can also check

https://stackoverflow.com/questions/48038038/xdotool-type-takes-ages-and-causes-entire-desktop-to-freeze

nshmyrev commented 2 years ago

You can also replace xdotool wtih Ydotool, it should be more stable

Philetjosie commented 2 years ago

Hello, I have the same issue than squintarelli, and i'm also not very skilled in programming. Can i find somewhere how to replace xdotool by ydotool (I guess I shoud be able to do that), but how will i tell nerd-dictation to use ydotool instead of xdotool ? However, vosk is huge, It has been 15 years that i was waiting to be able to dictate in OpenOffice, so I can wait few more months to fix this little problem (so far, on my tests, recognition was really precise). Thanks a lot for you all for this fantastic software.