Open squintarelli opened 2 years ago
Which model did you try exactly. You should be using smaller model like https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip
thanks I tried both
using vosk-model-en-us-0.22-lgraph.zip delay is reduced to about 15 seconds. still too long to be useful for dictation :-(
Ok, and what is the CPU load during recognition? You can run top in parallel terminal.
about 98% CPU and 2.7% MEM
Please share the screenshot
the computer gets frozen during speech recognition so top doesn't update. this is a screenshot just before I start talking
Feels like a driver issue, not vosk. Does portaudio work for you at all? Can you record audio with parec
command in command line? Maybe xdrtool creates locks.
Does demo python code to recognize a file work quickly for you.
Install python again....create environment ...then install VOSK in the environment...do not forget to remove previous model files ...before trying a different model.
Feels like a driver issue, not vosk. Does portaudio work for you at all? Can you record audio with
parec
command in command line? Maybe xdrtool creates locks.Does demo python code to recognize a file work quickly for you.
I don't have portaudio installed. should I ?
I don't have portaudio installed. should I ?
Yes, I believe nerd-dictation uses parec:
https://github.com/ideasman42/nerd-dictation/blob/master/nerd-dictation#L729
I installed it, but nothing has changed I'm not at all familiar with these tools how can I try to see if portaudio / parec are working ?
how can I try to see if portaudio / parec are working ?
Run it from the command line and see if it records the audio from the microphone.
I tried thw following
pacmd list-sources | grep ".monitor"
got this output
name: <alsa_output.usb-FongLun_USB_Microphone_201605-00.iec958-stereo.monitor>
monitor_of: 1
device.class = "monitor"
name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_5__sink.monitor>
monitor_of: 3
device.class = "monitor"
name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_4__sink.monitor>
monitor_of: 4
device.class = "monitor"
name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp_3__sink.monitor>
monitor_of: 5
device.class = "monitor"
name: <alsa_output.pci-0000_00_1f.3-platform-skl_hda_dsp_generic.HiFi__hw_sofhdadsp__sink.monitor>
monitor_of: 6
device.class = "monitor"
name: <alsa_output.usb-DisplayLink_Dell_USB3.0_Dock_1509160772-02.analog-stereo.monitor>
monitor_of: 8
device.class = "monitor"
then, for each device name, I tried
parec -d alsa_output.usb-DisplayLink_Dell_USB3.0_Dock_1509160772-02.analog-stereo.monitor --file-format=wav test.wav
a file tes.wav gets created but, opening it with the 'video' application (the default) it produces no sound...
Those are output devices, I doubt you can record from them. You need to find an input device ;)
LOL how do I list the input devices ? (I told you I'm not familiar at all with this things...) :-)
probably you can use gui to configure input, it should be more straightforward for you. After that just parec test.wav
should work
OK, I did. It works. I can record the audio with parec. But still nerd-dictation does not :-(
Ok congratulations! Could you please clarify what happens when you run nerd-dictation?
before installing pulseaudio, recognition was really slow after installing pulseaudio, recognition hangs the computer; I have to force power-off. :-(
Ok, that seems to be a problem with xdotool. You can comment out xdotool inside nerd-dictation and see if it works.
You can also check
You can also replace xdotool wtih Ydotool, it should be more stable
Hello, I have the same issue than squintarelli, and i'm also not very skilled in programming. Can i find somewhere how to replace xdotool by ydotool (I guess I shoud be able to do that), but how will i tell nerd-dictation to use ydotool instead of xdotool ? However, vosk is huge, It has been 15 years that i was waiting to be able to dictate in OpenOffice, so I can wait few more months to fix this little problem (so far, on my tests, recognition was really precise). Thanks a lot for you all for this fantastic software.
Hello,
I am using vosk models (tried english and italian) with nerd-dictation (https://github.com/ideasman42/nerd-dictation)
after starting recognition, for any short sentence to be recognized, it takes ages before producing any output. (by ages, I mean from 30 to 90 seconds; in the meantime, the computer is frozen). after getting the first output the terminal starts responding again and I'm able to stop it.
I am running Ubuntu 20.04.1 on a i7 .3GHz 8 core CPU with 15.4 GB RAM
any suggestion is highly appreciated..