dialogflow / asr-server

FastCGI support for Kaldi ASR
Apache License 2.0
184 stars 86 forks source link

I am not getting any text for decoding #32

Open viju2008 opened 6 years ago

viju2008 commented 6 years ago

I have followed the steps given

However i always get the following output from the asr server

{"status":"ok","data":[{"confidence":0.862751,"text":""}],"interrupted":"endofspeech","time":1080}

Please guide on how to check the asr logs

viju2008 commented 6 years ago

Sometimes i get only text as NO

mikenewman1 commented 6 years ago

I think you might be seeing the same problem that I posted about in #31 If I switch in a model that I built in January, the recognition is great. With the latest Kaldi I get nothing but [NOISE] tokens I posted a question to Kaldi-help https://groups.google.com/forum/#!topic/kaldi-help/1N4aVb75IdU but DP did not have any ideas

mikenewman1 commented 6 years ago

I found the problem. In order to run with the latest (batchnorm) models you need to add a line after loading

    {
      bool binary;
      kaldi::Input ki(nnet3_rxfilename_, &binary);
      trans_model_->Read(ki.Stream(), binary);
      nnet_->Read(ki.Stream(), binary);

      // This is the crucial line
      SetBatchnormTestMode(true, &(nnet_->GetNnet()));
}

Note that this only affects newer models (built using Kaldi source from after about March 2017) For full compatability with the latest Kaldi, these two are probably a good idea as well:

      SetDropoutTestMode(true, &(nnet_->GetNnet()));
      kaldi::nnet3::CollapseModel(kaldi::nnet3::CollapseModelConfig(), &(nnet_->GetNnet()));

This is shamelessly lifted from (eg) kaldi/src/online2bin/online2-wav-nnet3-latgen-faster.cc

formigone commented 6 years ago

I put some details on this same issue on https://github.com/dialogflow/asr-server/issues/37 for what helped me get over this "issue."

dpny518 commented 5 years ago

in which file do we add this line SetBatchnormTestMode(true, &(nnet_->GetNnet()));

mikenewman1 commented 5 years ago

In Nnet3LatgenFasterDecoder.cc

(in the function Nnet3LatgenFasterDecoder::Initialize)

hc038 commented 3 years ago

@viju2008 I am in the same situation now. did you solve the problem?

mikenewman1 commented 3 years ago

See the posts above. The code needed updating to support batchnorm. After this fix everything worked fine. Note however that I haven't used this code in years so it may be broken again.

mikenewman1 commented 3 years ago

Sorry. You could try asking in the usual Kaldi help channel

From: hc038 notifications@github.com Reply-To: dialogflow/asr-server reply@reply.github.com Date: Wednesday, November 11, 2020 at 6:49 AM To: dialogflow/asr-server asr-server@noreply.github.com Cc: "Mike Newman (SM)" Mike.Newman@microsoft.com, Mention mention@noreply.github.com Subject: Re: [dialogflow/asr-server] I am not getting any text for decoding (#32)

@mikenewman1https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmikenewman1&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437794104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=haF058T30dz%2B%2BpZw%2B6zYcmzh%2FcL1NgTEDmlJgWPPvVc%3D&reserved=0 thanks for the quick reply, I have added that line to Nnet3LatgenFasterDecoder.cc but I am getting this error

[Image removed by sender. image]https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F73985177%2F98808303-e1850f00-2441-11eb-8876-38d61e92a838.png&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437804103%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wYwInNZGZ8AkKNZAiNFy%2FAS2PTst6jWBSRCdyHPD1l0%3D&reserved=0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdialogflow%2Fasr-server%2Fissues%2F32%23issuecomment-725378875&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437814099%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=H386stw5ruT5p9IStk5n7xDFbApqIniHGp5EJ6MBrts%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADDS3TZTLTWARJVHVUK44V3SPJ223ANCNFSM4EIS5VYQ&data=04%7C01%7CMike.Newman%40microsoft.com%7C1bd19712d8fd4136732608d88637c887%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637406921437824091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4ktn1JzFYDD1%2Bhd9EaL2EYtkbo8eEntcL8RqjjcdoDM%3D&reserved=0.

hc038 commented 3 years ago

I am trying to do with the system mic(Recognition using web browser), does it automatically convert to 16000hz audio format?

realill commented 3 years ago

Javascript code downsamples browser input to 16000 https://github.com/dialogflow/asr-server/blob/master/asr-html/res/recorderWorker.js#L70

hc038 commented 3 years ago

thanks Ilya.

hc038 commented 3 years ago

This server is working fine with "curl" command but with "system mic(Recognition using web browser)" I only get this image any suggestions?

realill commented 3 years ago

By the end of the day if curl works you can write your own code to emulate what it does. But without multi-part you wont be able to productionize it very well. Multi-part allows to do "online" decoding where stream is decoded as you speak. So you better figure it out. ;)