alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.57k stars 1.06k forks source link

how to use test_microphone.py in linux for users #1334

Closed helpinghandindia1 closed 1 year ago

helpinghandindia1 commented 1 year ago

Hi, we are running vosk server with docker with two modes, kaldi-hi and kaldi-en-in. Kaldi-hi is good to use but en-in not seems appropriate for taking words and sentences.

  1. can you suggest other good suitable model for IN-English.
  2. how and where to add words for more clarity of some specific words.
  3. Any suggestion/solution to use microphone.py with asterisk dialplan step by step, so that users can take benefit directly, as of now we are using wav with test_ffmpeg.py. we tried two-three time to recompile with vosk-asterisk but not found - res_speech_vosk.so and not able to use.

thanks

nshmyrev commented 1 year ago

can you suggest other good suitable model for IN-English.

We do not have better model unfortunately.

how and where to add words for more clarity of some specific words.

See https://alphacephei.com/vosk/lm

Any suggestion/solution to use microphone.py with asterisk dialplan step by step, so that users can take benefit directly, as of now we are using wav with test_ffmpeg.py. we tried two-three time to recompile with vosk-asterisk but not found - res_speech_vosk.so and not able to use.

You have to describe issues you have in compilation of vosk-astrisk. There is no workaround. There is vosk-unimrcp sever though, you can use it with asterisk-unimrcp.

helpinghandindia1 commented 1 year ago

Hello @nshmyrev

Pls suggest better use for getting speech content to a audio file, as of now we are using wav file. pls suggest what is better way to get clear content for better output.

exten => s,n,Record(/root/tempfiles/test2/${callerdatafile}:wav,0,2) exten => s,n,System(/usr/bin/ffmpeg -i /root/tempfiles/test2/${callerdatafile}.wav -vn -ar 44100 -ac 2 -b:a 192k /root/tempfiles/test2/${callerdatafile2}.mp3) exten => s,n,Set(tempvalue11=${SHELL(/usr/local/bin/python3 /opt/vosk-server/websocket/test_ffmpeg3.py /root/tempfiles/test2/${callerdatafile2}.mp3 | grep '"text" :' | cut -d'"' -f4)})

nshmyrev commented 1 year ago

exten => s,n,Set(tempvalue11=${SHELL(/usr/local/bin/python3 /opt/vosk-server/websocket/test_ffmpeg3.py /root/tempfiles/test2/${callerdatafile2}.mp3 | grep '"text" :' | cut -d'"' -f4)})

You'd better send wav file to conversion, not mp3. Lossy mp3 codec degrades accuracy.

helpinghandindia1 commented 1 year ago

thanks @nshmyrev for your reply.

Yes we are using wav file for conversion, we are testing mp3 only for english en case.

Would be great if you able to guide us regarding microphone.py implementation for direct speech from users, without any audio file intervention.

nshmyrev commented 1 year ago

Would be great if you able to guide us regarding microphone.py implementation for direct speech from users, without any audio file intervention.

Asterisk works with sip clients, not microphone. You need to configure sip gateway probably.

nshmyrev commented 1 year ago

Feel free to reopen if you have other questions

Hemanthchat360 commented 2 months ago

Hi Team,

I am currently working on passing audio signals from the unimrcp demo_recog to a Python socket-based program. While I am able to pass the signal, I encounter an issue when saving the audio to a file and subsequently playing it back. Instead of hearing the intended human voice, the audio playback consists solely of a buzzing noise.

I have tried several different ffmpeg conversion commands, but none of them have resolved the issue. Below are the commands I have attempted:

ffmpeg -i input.wav -ar 8000 -f s16le -y output.ulaw ffmpeg -i input.wav -ar 8000 -f mulaw -y -map_channel 0.0.0 output.ulaw ffmpeg -i input.wav -ar 8000 -f mulaw -y -map_channel 0.0.0 output.ulaw ffmpeg -i input.wav -ar 4000 -f mulaw -y -map_channel 0.0.0 output.ulaw ffmpeg -i input.wav -ar 16000 -f s16le -y -ac 1 output.ulaw ffmpeg -i input.wav -ar 16000 -f s16le -y -ac 1 output.ulaw ffmpeg -i input.wav -ar 8000 -f mulaw -y output.ulaw ffmpeg -i input.wav -ar 8000 -f mulaw -y output.ulaw

The codecs specified in my mrcp.conf file are as follows:

codecs = PCMU PCMA L16/96/8000 telephone-event/101/8000

I would greatly appreciate any guidance or suggestions on how to resolve this issue and achieve clear audio playback.