espressif / esp-sr

Speech recognition
Other
537 stars 102 forks source link

after first wakeup,can't recognizing speech command immediately (AIS-998) #47

Closed upupycl closed 7 months ago

upupycl commented 1 year ago

hello ,i have a problem and don't know how to solve it,could you please give some idea, thanks for your help.

when the board start running, i woke it up for the first time, it can be awakened normally, then i speak the speech command but it had no reaction until time out and needs to wake it up again, when i wake it again and speak the command, it can recognize commads properly, i don't know why i can't recognize speech command after first wakeup.

feizi commented 1 year ago

I can't reproduce your problem.
When I use esp-skainet/examples/en_speech_commands_recognition, speech command can be recognized for the first time. Maybe your environment is noisy.
The first recognition is indeed different from the later recognitions. As you observed, the channel of first wakeup is 2 which means that the raw microphone data is used for the first commands recognition. Channel 0 and channel 1 is BSS output.
Why we choose raw data for the first time is that the BSS algorithm need to take some time to converge.

upupycl commented 1 year ago

Ok, at firsr, thanks for your reply. Then, i thought a method to solve this problem,is it possible to achieve?

the method as follow : I'm going to prepare a audio file(.wav) with a wake word, then when it start up, i input the content of the file through i2s to wake it up,then maybe it also can't recognized speech commands,but i don't care. after a while, i wake it up through micphone and maybe it can recognize speech command normally.

can i wake it up through a file instead of microphone, if i can,how should i do. Is there an interface for this

feizi commented 1 year ago

We don't have an existing interface. You can do as below:

  1. You need to read a file from flash or SD card.
  2. Feed the file data by afe->feed() function.
  3. When the file is fed, you can restart to feed I2S data

But I'm not sure if this method is effective. Your file data and I2S data are not contiguous, so BSS need to take some time to re-converge.