dialogflow / asr-server

FastCGI support for Kaldi ASR
Apache License 2.0
184 stars 86 forks source link

8 Khz Acoustic Model with Ivector #27

Open alerenato opened 6 years ago

alerenato commented 6 years ago

I have built an acoustic model for NNET3 of 8 Khz with ivectors ( similar configuration to Switchboard). I'm trying it with asr-server. I made some changes (for example, #def AUDIO_FREQUENCY = 8000) in all the places of the code where 16 Khz appears. The system runs without errors but the results is "" with sentences of 8 kHz, 16 bit raw when in Kaldi decoder the result is correct. I would like to know if I should make more modifications to be able to run my model. I have seen that the api.ai model does not have ivectors. Thanks in advance.

realill commented 6 years ago

As far I know kaldi is not 8khz friendly and recommendation is always to do 8->16Khz transformation before decoding.

If you want to work with 8Khz you have to ask this question to kaldi maintainers. Or maybe going through switchboard decoding scripts and figure out what do they do.

mikenewman1 commented 6 years ago

You just have to set the parameters correctly in mfcc.conf. There are plenty of examples in the swbd recipe. But as you point out, there are several places in the asr-server code with hard-coded sample rates that you need to fix up.

mikenewman1 commented 6 years ago

I believe the two locations are in OnlineDecoder.cc and RequestRawReader.h

alerenato commented 6 years ago

Thank you very much, Michel. I will try with these modifications and report the results in this place.