how to configure online dictation with microphone

alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

BSD 2-Clause "Simplified" License

1.07k stars 341 forks source link

how to configure online dictation with microphone #148

Closed AlexPeng19 closed 5 years ago

AlexPeng19 commented 6 years ago

i noticed the demo http://bark.phon.ioc.ee/dictate/ could get audio from microphone, but current README only showed the recognization by sending audio file. which api could i use to get that.

by the way i am using https://kaljurand.github.io/dictate.js, i want to replace wss://bark.phon.ioc.ee with my service.

looking forward your response.

janengelmohr commented 6 years ago

Have you seen this?

2017-06-28: The sample client program can now accept audio from stdin. This can be used to test the server with a live microphone, e.g.: arecord -f S16_LE -r 16000 | python kaldigstserver/client.py -r 32000 -. Thanks to @wkuna!

alumae commented 5 years ago

It should be resolved? Reopen if there is still a problem.

boleamol commented 5 years ago

I am trying to use with live microphone using following command

_arecord -f S16LE -r 16000 | python kaldigstserver/client.py -r 32000

but getting following error..

_usage: client.py [-h] [-u URI] [-r RATE] [--save-adaptation-state SAVE_ADAPTATION_STATE] [--send-adaptation-state SEND_ADAPTATION_STATE] [--content-type CONTENTTYPE] audiofile client.py: error: too few arguments Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono

alumae commented 5 years ago

You are missing "-" at the end of the command.

boleamol commented 5 years ago

Thanks Its working perfectly.. If I want to use my acoustic then which model I have to train? like tri2b, tri2b_mmi.. if neural network then which NN like nnet1, nnet2, nnet3... If possible tell me example in kaldi/egs.. Thank you in advance..

alumae commented 5 years ago

The tool supports trigram models (with no SAT), nnet2 models and nnet3 models. I recommend to use chain TDNN models.

maitrungduc1410 commented 4 years ago

Here I have complete and working code for decoding from microphone, python 2 and 3 are supported. Hope this help someone