Romkabouter / ESP32-Rhasspy-Satellite

The repo has implementing an esp32 standalone MQTT audio streamer. Is is desinged to work as a satellite for Rhasspy (https://rhasspy.readthedocs.io/en/latest/). It supports multiple devices
GNU General Public License v3.0
358 stars 64 forks source link

Attempt to get it to work with Voco #42

Closed flatsiedatsie closed 2 years ago

flatsiedatsie commented 3 years ago

This is a continuation of discussion here.

I've managed to get Snips to recognise the wake-word, but only right after booting the Atom Echo.

Snips does recognise that there is audio input.

While in idle mode, Snips Watch indicates that audio is being heard.

[14:55:18] [VoiceActivity] Down on site atomecho
[14:55:21] [VoiceActivity] Up on site atomecho
[14:55:22] [VoiceActivity] Down on site atomecho
[14:55:35] [VoiceActivity] Up on site atomecho
[14:55:53] [VoiceActivity] Down on site atomecho
[14:55:55] [VoiceActivity] Up on site atomecho
[14:55:56] [VoiceActivity] Down on site atomecho
[14:56:00] [VoiceActivity] Up on site atomecho

If I press the button to start a session, a session is created, and the dialogue manager listens to the stream from the Atom Echo. But the voice input is not recognised as a voice command:

[14:56:10] [Dialogue] was asked to start a session on site atomecho
[14:56:10] [Asr] was asked to stop listening on site atomecho
[14:56:10] [Hotword] was asked to toggle itself 'off' on site atomecho
[14:56:10] [Dialogue] session with id 'fe685174-5053-4651-8756-8cb3b066003e' was started on site atomecho
[14:56:10] [Asr] was asked to listen on site atomecho
[14:56:11] [VoiceActivity] Down on site atomecho
[14:56:12] [VoiceActivity] Up on site atomecho
[14:56:15] [VoiceActivity] Down on site atomecho
[14:56:16] [VoiceActivity] Up on site atomecho
[14:56:18] [VoiceActivity] Down on site atomecho
[14:56:19] [VoiceActivity] Up on site atomecho
[14:56:26] [Dialogue] session with id 'fe685174-5053-4651-8756-8cb3b066003e' was ended on site atomecho. The session was ended because one of the component didn't respond in a timely manner
[14:56:26] [Asr] was asked to stop listening on site atomecho
[14:56:26] [Hotword] was asked to toggle itself 'on' on site atomecho

If I don't speak into the Raspberry Pi version, then things look a bit different.

[15:07:45] [VoiceActivity] Up on site azrxidia
[15:07:46] [Hotword] detected on site azrxidia, for model hey_snips
[15:07:46] [Asr] was asked to stop listening on site azrxidia
[15:07:46] [Hotword] was asked to toggle itself 'off' on site azrxidia
[15:07:46] [Dialogue] session with id '16427483-191a-4c39-9f0b-199dd4cb0e7e' was started on site azrxidia
[15:07:46] [Asr] was asked to listen on site azrxidia
[15:07:46] [VoiceActivity] Up on site atomecho
[15:07:47] [VoiceActivity] Down on site azrxidia
[15:07:50] [Asr] captured text "" in 4.0s
[15:07:50] [Asr] was asked to stop listening on site azrxidia
[15:07:50] [Dialogue] session with id '16427483-191a-4c39-9f0b-199dd4cb0e7e' was ended on site azrxidia. The session was ended because the platform didn't understand the user
[15:07:50] [Asr] was asked to stop listening on site azrxidia

So what would support the idea that some MQTT message is missing.

As an aside, I also noticed that the wave header is slightly different:

ESP32

RIFF,WAVEfmt ?>}datay?u?y?t?z?v?|?~?q?v?u??s?z?u?y?t?v?v?w?y?q?o?u?|?x?y?n?t?t?u?r?r?v?s?x?q?w?s?x?m?v?s?o?t?s?y?s?w?u?o?~?w?s?t?s?u?y?~?s?~?t?u?{Հ?x?u?u?u?}?xՁ?y?x?x?zՀ?s?u?s?v?z?z?z?p?n?n?w?r?v?z?p?t?r?q?w?|?x?u?o?q?|?y?y?t?t?o?~?}?y?w?p?w?}?~?t?v?v?z?{?{?{?z?u?xՌՋ?~?}?w?yՄՅՂ?{?{?yՂՈ?|Մ?v?~ՂՃ?Հ?|?wՁ?Ղ?z?|?|Յ?ՂՀ?|?}Ղ???|?~Հ?~Մ?y?zՀՈՃՁՂ?}ՆՋ?Ձ?zՄՂՈՁ?Մ?Ո?ՇՂՃՂ?}?}??|ՅՃՄՄՃ?|Ձ?~ՊՏ?}ՂՀՅՉՅՇՂ?~?yՊՄՄՁ?x?~Ձ??~?{?~?zՂ?{?y?~?}?{?|?|?y?z??~?{Ղ?qՀ?uՂ?t?w?|?z?~?x?{?

USB microphone on Raspberry Pi:

RIFF4WAVEfmt ?>}tim??wdata????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????#")%##0$*%%AAHNMVZPHNWALKGC))????????!
???????/!>@>$$2*8@1-?>GIKB:ECV=3/3<19C?=5H=80)& 
                                                ) <@5/,?23JH;

I also check the output of the various commands in Mosquitto to find out which exact message was missing.

---SNIPS----

hermes/hotword/azrxidia/detected {"siteId":"azrxidia","modelId":"hey_snips","modelVersion":"workflow-hey_snips_subww_feedback_10seeds-2018_12_04T12_13_05_evaluated_model_0002","modelType":"universal","currentSensitivity":0.5,"detectionSignalMs":1614003432719,"endSignalMs":1614003432719}

hermes/asr/stopListening ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/hotword/toggleOff ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å

hermes/dialogueManager/sessionStarted ä"sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","customData":null,"siteId":"azrxidia","reactivatedFromSessionId":nullå

hermes/asr/startListening ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","startSignalMs":1614003432719å

hermes/audioServer/azrxidia/replayRequest {"requestId":"azrxidia-1614003432719","startAtMs":1614003432719,"siteId":"azrxidia"}
hermes/audioServer/azrxidia/replayResponse RIFF^WAVEfmt ?>}tim??wrpidazrxidia-1614003432719rprf
hermes/audioServer/azrxidia/replayResponse
hermes/audioServer/azrxidia/replayResponse
hermes/audioServer/azrxidia/replayResponse

hermes/asr/textCaptured ä"text":"what time is it","likelihood":1.0,"tokens":Ää"value":"what","confidence":1.0,"rangeStart":0,"rangeEnd":4,"time":ä"start":0.0,"end":1.05åå,ä"value":"time","confidence":1.0,"rangeStart":5,"rangeEnd":9,"time":ä"start":1.05,"end":1.17åå,ä"value":"is","confidence":1.0,"rangeStart":10,"rangeEnd":12,"time":ä"start":1.17,"end":1.3199999åå,ä"value":"it","confidence":1.0,"rangeStart":13,"rangeEnd":15,"time":ä"start":1.3199999,"end":2.1ååÅ,"seconds":2.0,"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/asr/stopListening ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/nlu/query ä"input":"what time is it","asrTokens":Ää"value":"what","confidence":1.0,"rangeStart":0,"rangeEnd":4,"time":ä"start":0.0,"end":1.05åå,ä"value":"time","confidence":1.0,"rangeStart":5,"rangeEnd":9,"time":ä"start":1.05,"end":1.17åå,ä"value":"is","confidence":1.0,"rangeStart":10,"rangeEnd":12,"time":ä"start":1.17,"end":1.3199999åå,ä"value":"it","confidence":1.0,"rangeStart":13,"rangeEnd":15,"time":ä"start":1.3199999,"end":2.1ååÅ,"intentFilter":Ä"createcandle:get_time","createcandle:set_value","createcandle:stop_timer","createcandle:set_timer","createcandle:get_value","createcandle:set_state","createcandle:get_boolean","createcandle:list_timers","createcandle:get_timer_count"Å,"id":"1eee4474-c274-4a09-a7a5-7a65229839fa","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/nlu/intentParsed ä"id":"1eee4474-c274-4a09-a7a5-7a65229839fa","input":"what time is it","intent":ä"intentName":"createcandle:get_time","confidenceScore":1.0å,"slots":ÄÅ,"sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","alternatives":Ää"intentName":"createcandle:get_value","confidenceScore":0.06613055,"slots":ÄÅå,ä"intentName":"createcandle:list_timers","confidenceScore":0.048560027,"slots":ÄÅåÅå
hermes/intent/createcandle:get_time ä"sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","customData":null,"siteId":"azrxidia","input":"what time is it","asrTokens":ÄÄä"value":"what","confidence":1.0,"rangeStart":0,"rangeEnd":4,"time":ä"start":0.0,"end":1.05åå,ä"value":"time","confidence":1.0,"rangeStart":5,"rangeEnd":9,"time":ä"start":1.05,"end":1.17åå,ä"value":"is","confidence":1.0,"rangeStart":10,"rangeEnd":12,"time":ä"start":1.17,"end":1.3199999åå,ä"value":"it","confidence":1.0,"rangeStart":13,"rangeEnd":15,"time":ä"start":1.3199999,"end":2.1ååÅÅ,"asrConfidence":1.0,"intent":ä"intentName":"createcandle:get_time","confidenceScore":1.0å,"slots":ÄÅ,"alternatives":Ää"intentName":"createcandle:get_value","confidenceScore":0.06613055,"slots":ÄÅå,ä"intentName":"createcandle:list_timers","confidenceScore":0.048560027,"slots":ÄÅåÅå

also:
hermes/voiceActivity/azrxidia/vadDown {"siteId":"azrxidia","signalMs":1614003434199}
hermes/voiceActivity/azrxidia/vadUp ä"siteId":"azrxidia","signalMs":1614003432077å

----Atom Echo button----

hermes/dialogueManager/startSession æ"init":æ"type":"action","canBeEnqueued": falseå,"siteId":"atomecho"å

hermes/asr/stopListening æ"siteId":"atomecho","sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d"å
hermes/hotword/toggleOff æ"siteId":"atomecho","sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d"å

hermes/dialogueManager/sessionStarted æ"sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d","customData":null,"siteId":"atomecho","reactivatedFromSessionId":nullå

hermes/asr/startListening æ"siteId":"atomecho","sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d","startSignalMs":nullå

hermes/voco/atomecho/mute æ"mute": trueå
...

hermes/voiceActivity/atomecho/vadDown æ"siteId":"atomecho","signalMs":-386å

----Atom Echo hotword detected----

hermes/hotword/azrxidia/detected æ"siteId":"atomecho","modelId":"hey_snips","modelVersion":"workflow-hey_snips_subww_feedback_10seeds-2018_12_04T12_13_05_evaluated_model_0002","modelType":"universal","currentSensitivity":0.5,"detectionSignalMs":-70,"endSignalMs":-70å

hermes/voco/atomecho/play æ"sound_file": "start_of_input"å

hermes/audioServer/atomecho/audioFrame RIFF,WAVEfmt ?>ådata?????????ؽ??????????????????????????????????????????????ؽ????????غ????ؿؽ????غؿؾػؾؿ??ع??????ظ????????ؾ????ؾؾؾ????????ؽ??ؿ????ظ??????صطؽؼػغغ????????ؼ??ضؼؼشؾغ????ع??ؼ??ؼ??ظؼر??ػعػظ??ؾؼ??ذضظص??زؾشؼطؾظؽؾطؽخرخصؼذزشسزظخعػش??زطزػغطضضجزعج??ضرذ??غؼحذظ??ظرش??ؼذطصثزضص??سشش??رشص??عسصح??عظخؼجؽزػرخرشصشذظصحطظشغغصشسسسذسصغزضطشعغص??ؽظؿظرظطؼصؿ??عصغؼؽطػص???
hermes/asr/stopListening æ"siteId":"atomecho","sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868"å
hermes/audioServer/atomecho/audioFrame RIFF,WAVEfmt ?>ådata???غ??????ظؿؾ????ؼؾظغطغ??ظؿع??ط??ؾع??ططظعشظغ??ظؽعظؼطضسطضؼشرطسعسطظغؽؿ????صعصغ??غؿؼظط????????ػؾؽؼ????ؼ??صػظػظؾظػػععؼػ??ؾ??سؿشغػششرؽخغشػؽزؼضزسض??ضصض??سظػػرؽعؼػعظص??صغسؼؼػظؼظغؼؼظ????ظػ??ؿعغعظظ??ػع??ظحؽش??ضظعػطصذ??ظؿؽظطصؽصؽشغسسط??صؾؿغعغؽؽضعغ??ػػؽؽ??ؼص??ظدضعضصضغظخصػطؽضػتػشظرعصصدذشطزطخخظظذزظضصؾد??حرخ?
hermes/hotword/toggleOff æ"siteId":"atomecho","sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868"å

hermes/dialogueManager/sessionStarted æ"sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868","customData":null,"siteId":"atomecho","reactivatedFromSessionId":nullå

hermes/asr/startListening æ"siteId":"atomecho","sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868","startSignalMs":-70å

hermes/voco/atomecho/mute æ"mute": trueå

hermes/voiceActivity/atomecho/vadDown {"siteId":"atomecho","signalMs":-386}
flatsiedatsie commented 3 years ago

Yes I already had a MQTT_MAX_PACKET_SIZE define in the code, but it seemed to not take. The makers of pubsubclient seemed to recommend using a function to change this vlue, so I added that.

audioServer.setBufferSize(MQTT_MAX_PACKET_SIZE);

We're getting closer :-)

flatsiedatsie commented 3 years ago

I tried recording the audio using the tool you created. It worked! There is a slight metalic sound to it, but it's definitely understandable. The volume is very low though. I'll try playing with the gain option.

What is the gain range? What would you recommend for getting more volume?

Ah, 0 to 8 (from the web ui)

Romkabouter commented 3 years ago

Gain is actualy only used in the Matrix Voice I think. Expect unexpected results!

flatsiedatsie commented 3 years ago

Good news: I managed to get it to detect a hotword by shouting very loudly.

I'm looking closer at how the back-and-forth with Snips is going. After it detects the hotword, the ASR doesn't receive audio (timeout).

{"sessionId":"42b02e1c-331e-4aaf-abb1-5a548abedeec","customData":null,"siteId":"ATOMECHO","reactivatedFromSessionId":null}
{"sessionId":"42b02e1c-331e-4aaf-abb1-5a548abedeec","customData":null,"termination":{"reason":"timeout","component":"asr"},"siteId":"ATOMECHO"}
flatsiedatsie commented 3 years ago

There is a doubling going on again it seems.

14:10:31.055 -> end of idle. Stream was set to true.
14:10:31.055 -> Total heap: 293320
14:10:31.055 -> Free heap: 147664
14:10:31.055 -> Incoming MQTT message. Topic: hermes/voco/ATOMECHO/play
14:11:10.843 -> Incoming MQTT message. Topic: hermes/hotword/azrxidia/detected
14:11:11.058 -> Incoming MQTT message. Topic: hermes/voco/ATOMECHO/play
14:11:11.058 -> Incoming MQTT message. Topic: hermes/hotword/toggleOff
14:11:11.093 -> toggleOff message was for us
14:11:11.093 -> SessionId in toggleOff:59a6374e-89c4-49b7-a654-d77f77a7384c
14:11:11.093 -> Hotword detected event
14:11:11.093 -> Enter HotwordDetected
14:11:11.093 -> -Semaphone something
14:11:11.093 -> -Re-stream
14:11:25.930 -> Incoming MQTT message. Topic: hermes/hotword/toggleOn
14:11:25.930 -> toggleOn message was for us. Going to idle mode.
14:11:25.930 -> hw-detected-go-back-to-idle
14:11:25.930 -> Enter Idle
14:11:25.968 -> still in idle
14:11:25.968 -> end of idle. Stream was set to true.
14:11:25.968 -> Total heap: 293412
14:11:25.968 -> Free heap: 150684
14:11:26.472 -> Incoming MQTT message. Topic: hermes/voco/ATOMECHO/play
14:12:32.626 -> One of them failed: Enter MQTTDisconnected
14:12:32.626 -> Audio connected: 0, Async connected: 0
14:12:32.626 -> Enter MQTTDisconnected
14:12:32.626 -> Connecting MQTT: 192.168.2.165, 1883
14:12:32.626 -> Connecting MQTT: 192.168.2.165, 1883
14:12:32.626 -> asyncclient connect was called
14:12:32.626 -> asyncclient connect was called
14:12:32.626 -> also reconnecting to audio
14:12:32.626 -> also reconnecting to audio
14:12:47.011 -> 
14:12:47.011 -> ELF file SHA256: 0000000000000000
14:12:47.046 -> 
14:12:47.046 -> Backtrace: 0x40088938:0x3ffbf9d0 0x40088bb5:0x3ffbf9f0 0x40140d30:0x3ffbfa10 0x400870c9:0x3ffbfa30 0x4000cff5:0x3ffde0d0 0x400db815:0x3ffde0f0 0x400db88e:0x3ffde130 0x400d156d:0x3ffde160 0x400d15a7:0x3ffde1f0 0x400d16c7:0x3ffde210 0x400d1fa2:0x3ffde230 0x40089c06:0x3ffde6b0
14:12:47.046 -> 
14:12:47.046 -> Rebooting...
flatsiedatsie commented 3 years ago

These are some messages going to the ASR:

hermes/asr/stopListening {"siteId":"azrxidia","sessionId":"569ac21f-cde0-4004-be21-f6112640cfdf"}
hermes/asr/startListening {"siteId":"azrxidia","sessionId":"569ac21f-cde0-4004-be21-f6112640cfdf","startSignalMs":1623327749156}
hermes/asr/stopListening {"siteId":"azrxidia","sessionId":null}

hermes/asr/stopListening {"siteId":"ATOMECHO","sessionId":"bf788e48-fe11-4206-9469-5ac4ec3fd8bd"}
hermes/asr/startListening {"siteId":"ATOMECHO","sessionId":"bf788e48-fe11-4206-9469-5ac4ec3fd8bd","startSignalMs":-20}
hermes/asr/stopListening {"siteId":"ATOMECHO","sessionId":null}

The StartSignalMS seems to be a strange value: -20. Maybe that's because the time data isn't in the audio stream?

Romkabouter commented 3 years ago

As I do not know what your code looks like, I do not know where the doubling occurs.

Is your asr listening? Depens on the snips.toml file I believe

flatsiedatsie commented 3 years ago

The ASR does work for other satellites in the house, which are based on Voco/Snips. Perhaps they are sending an extra message.

The latest Arduino code can be found here: https://github.com/flatsiedatsie/voco_mini_sat

Romkabouter commented 3 years ago

Ok, checking your code.

1) why do you have the publish to asr/stopListening" on line 44? This actually stops the ASR from listening in the hotworddetected state 2) Is that the exact code? Is this also p[rinted twice? Serial.println("Creating I2Stask"); If so, maybe the check f (i2sHandle == NULL) does not work as expected

flatsiedatsie commented 3 years ago

1. Sharp eyes -) I was trying to stop and then restart the ASR, hoping that would fix the issue. But then I tried skipping the HotwordDetected state alltogether. So currently that code is never called. All the HotwordDetected state did, was to stop the stream and restart it, which I suspected wasn't needed if there wasn't on-board hotword detection being done.

2. I've only removed the wifi password :-)

Just in case you'd like to try uploading via the arduino IDE yourself:

  1. Tou'll need to add ESP32 support. In the menu go to settings, and add these two lines under additional board manager urls:
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json
https://dl.espressif.com/dl/package_esp32_index.json

(maybe restart the IDE)

2. Then under tools -> boards -> boards manager select M5Atom-stack as the device.

  1. Make sure the USB port is selected as the serial port (under tools as well)

  2. Make sure the serial monitor is closed in case it's already open. Then click on ESP32 Sketch data upload under the tools menu. This will upload the settings file to the SPIFF storage.

  3. Upload the code (arrow button in the top-left)

  4. Open the serial monitor (under tools), and you can see the serial output.

I believe I've managed to remove the double call of MMQTTDisconnected state. The run method was apparently calling it before it has switched to the new state, causing the state to be initiated twice. It now no longer crashes after the first recognition of "hey snips".

The strange thing is that the ASR stops responding for the entire system if I use the AtomEcho. The ASR also stops responding to the main microphone, although it still does hotword detection fine. A session is also created just fine.

Sometimes the ASR stops working alltogether, and sometimes it will work 50%, intermittently: after a succesfull run it will not respond the next time, until it times out, and then start responding again after that, and so forth. This seems to only happens if the AtomEcho is on the network.

The AtomEcho also seems to go into reboot loops. I'm not sure how that's even possible. It's as if it remembers that the previous time it booted up, it failed, and will continue to do so until I unplug it, and then plug it in again.

flatsiedatsie commented 3 years ago

Just saw another strange situation where I disconnected the AtomEcho, and then the ASR started only listening for 1 second on the main microphone.

[11:15:07] [Hotword] detected on site azrxidia, for model hey_snips
[11:15:07] [Asr] was asked to stop listening on site azrxidia
[11:15:07] [Hotword] was asked to toggle itself 'off' on site azrxidia
[11:15:07] [Dialogue] session with id '1819212f-7fe3-40c9-83f7-26021d46f671' was started on site azrxidia
[11:15:07] [Asr] was asked to listen on site azrxidia
[11:15:09] [Asr] captured text "unknownword" in 1.0s
[11:15:09] [Asr] was asked to stop listening on site azrxidia
[11:15:09] [Nlu] was asked to parse input "unknownword"
[11:15:09] [Nlu] intent not recognized for "*"
[11:15:09] [Dialogue] session with id '1819212f-7fe3-40c9-83f7-26021d46f671' was ended on site azrxidia. The session was ended because the platform didn't understand the user
[11:15:09] [Asr] was asked to stop listening on site azrxidia
[11:15:09] [Hotword] was asked to toggle itself 'on' on site azrxidia
[11:15:13] [Hotword] detected on site azrxidia, for model hey_snips
[11:15:13] [Asr] was asked to stop listening on site azrxidia
[11:15:13] [Hotword] was asked to toggle itself 'off' on site azrxidia
[11:15:13] [Dialogue] session with id '5efe57ea-f984-4ad1-8342-ba4a9d6a1e47' was started on site azrxidia
[11:15:13] [Asr] was asked to listen on site azrxidia
[11:15:28] [Dialogue] session with id '5efe57ea-f984-4ad1-8342-ba4a9d6a1e47' was ended on site azrxidia. The session was ended because one of the component didn't respond in a timely manner
[11:15:28] [Asr] was asked to stop listening on site azrxidia
[11:15:28] [Hotword] was asked to toggle itself 'on' on site azrxidia
[11:15:38] [Hotword] detected on site azrxidia, for model hey_snips
[11:15:38] [Asr] was asked to stop listening on site azrxidia
[11:15:38] [Hotword] was asked to toggle itself 'off' on site azrxidia
[11:15:38] [Dialogue] session with id '13bfee2f-4b4a-4ae0-8896-916fb8e6d27b' was started on site azrxidia
[11:15:38] [Asr] was asked to listen on site azrxidia
[11:15:40] [Asr] captured text "unknownword" in 1.0s
[11:15:40] [Asr] was asked to stop listening on site azrxidia
[11:15:40] [Nlu] was asked to parse input "unknownword"
[11:15:40] [Nlu] intent not recognized for "*"
[11:15:40] [Dialogue] session with id '13bfee2f-4b4a-4ae0-8896-916fb8e6d27b' was ended on site azrxidia. The session was ended because the platform didn't understand the user
[11:15:40] [Asr] was asked to stop listening on site azrxidia
[11:15:40] [Hotword] was asked to toggle itself 'on' on site azrxidia
[11:16:37] [Hotword] detected on site azrxidia, for model hey_snips
[11:16:37] [Asr] was asked to stop listening on site azrxidia

After that it reverted to the intermittent "ASR listens, ASR is deaf" situation.

flatsiedatsie commented 3 years ago

I've tried to manually run the ASR and check it's output. Here's what happens with a "normal" call from Voco:

pi@thuis:~/.webthings/addons/voco/snips $ LD_LIBRARY_PATH=. /home/pi/.webthings/addons/voco/snips/snips-asr -u /home/pi/.webthings/data/work -a /home/pi/.webthings/addons/voco/snips/assistant -c /home/pi/.webthings/addons/voco/snips/snips.toml
[11:27:30.198765] INFO :snips_asr_hermes::handler: Using model from "/home/pi/.webthings/data/work/injections/20210209T163004178929730/inj_20210616T092026773150365/asr"
[11:27:30.332529] INFO :snips_kaldi::decode::model: Loading model v2
[11:27:31.958167] INFO :snips_asr_hermes::handler : Preparing decoder
[11:27:31.958415] INFO :snips_asr_hermes::handler : Preparing decoder
[11:28:11.557659] INFO :snips_asr_hermes::handler : Listening at site id azrxidia
[11:28:11.557826] INFO :snips_asr_hermes::handler : Listening
[11:28:11.704154] INFO :snips_asr_lib::asr        : T0       entered AsrRunner::run
[11:28:11.704224] INFO :snips_asr_lib::asr        : T0+0.000 capture started
[11:28:13.883099] INFO :snips_asr_lib::asr        : T0+2.179 endpoint detected (rule:4) frame:155 samples:39680 signal_time:2.48 rtf:0.327
[11:28:13.883973] INFO :snips_asr_lib::asr        : Source thread stop on push: "SendError(..)"
[11:28:13.884145] INFO :snips_asr_lib::asr        : T0+2.180 capture ended
[11:28:13.885827] INFO :snips_asr_lib::asr        : T0+2.182 decoder finalized
[11:28:13.894667] INFO :snips_asr_lib::asr        : T0+2.191 lookup and post-processing done
[11:28:13.894747] INFO :snips_asr_lib::asr        : decoded: [Recognition { decoded_string: "what time is it", likelihood: 1.0, tokens: Some([Token { value: "what", confidence: 1.0, time: (0.0, 1.38), range: 0..4 }, Token { value: "time", confidence: 1.0, time: (1.38, 1.4399999), range: 5..9 }, Token { value: "is", confidence: 1.0, time: (1.4399999, 1.62), range: 10..12 }, Token { value: "it", confidence: 1.0, time: (1.62, 2.31), range: 13..15 }]) }]
[11:28:13.895411] INFO :snips_asr_hermes::handler : Publishing the recognition

And this is all that happens with the AtomEcho:

[11:28:25.235052] INFO :snips_asr_hermes::handler : Preparing decoder
[11:29:24.793911] INFO :snips_asr_hermes::handler : Listening at site id ATOMECHO
[11:29:24.793989] INFO :snips_asr_hermes::handler : Listening
Romkabouter commented 3 years ago

All the HotwordDetected state did, was to stop the stream and restart it, which I suspected wasn't needed if there wasn't on-board hotword detection being done.

It also initializes the wave header and updates the led status. I recommend not to fiddle with the status too much.

Just in case you'd like to try uploading via the arduino IDE yourself: I do not use Arduino IDE ;)

What is this azrxidia I see in all your messages? Can you try to stop that stream? And can you put the contents of your snips.toml?

flatsiedatsie commented 3 years ago

I'd be happy to. Here's the snips.toml: https://github.com/createcandle/voco/blob/master/snips/snips.toml

I've also stripped out the LED parts (there was an error I couldn't fix, so I just stripped it out completely). I've also removed the OTA updates, since that won't be needed either and I figured it might leave more memory.

I've re-enabled the HotwordDetected state, but the result is the same. I'll update the code on github.

Romkabouter commented 3 years ago

've also stripped out the LED parts (there was an error I couldn't fix, so I just stripped it out completely).

If you remove the methods updateColors(int colors) and updateBrightness(int brightness) in your device ocde, then nothing will be done :)

I think you need to set this for the AudioServer:

[snips-audio-server]
bind = "+@mqtt"

That is so that the audioserver actually listens to all audio streams. This setting is then the same as in the [snips-hotword] setting, which might clarify why the hotword is listening and the rest not. I am not 100% sure though, but setting it to + is not a bad idea in general

flatsiedatsie commented 3 years ago

I'll give it a go.

I could also add it to common? Perhaps that will help ASR to detect the stream?

I've also added a feature to Voco so that it can provide the current time through an MQTT request. I wanted to experiment with sending the timestamp in the wav header.

flatsiedatsie commented 3 years ago

Something else I'm curious about: would it be possible to have the AtomEcho connect based on hostname instead of IP address? I seem to see some hints in the settings this might be possible? if so, then the main controller could infuse that hostname into the AtomEcho at the moment of uploading the code.

Romkabouter commented 3 years ago

I could also add it to common? Perhaps that will help ASR to detect the stream?

Might be a good idea, than you should have it set for all sections

Romkabouter commented 3 years ago

Something else I'm curious about: would it be possible to have the AtomEcho connect based on hostname instead of IP address? I seem to see some hints in the settings this might be possible? if so, then the main controller could infuse that hostname into the AtomEcho at the moment of uploading the code.

It already does if you pust a hostname instead of an IP

Romkabouter commented 2 years ago

Hi @flatsiedatsie,

We have come a long way since any activity here. Did you make any progress on the subject? Maybe you can checkout my new master branch, I have just released version 7.8.

If you require some help from me, please give me a shout. Otherwise I will close this issue at some point in the future. I have tried to get Voco running, but ran into some issues which I cannot remember and stopped

Romkabouter commented 2 years ago

@flatsiedatsie it seems Voco is not available anymore as Addon, is that correct? I see a Voice Contol, but that is different. It is still in the list found here: https://github.com/WebThingsIO/addon-list/tree/master/addons

I just cannot find it in the Addon in Webthings. Note: I am using the docker image

flatsiedatsie commented 2 years ago

Voco is only available on the Raspberry Pi.

I spent considerable time on it last time, but unfortunately couldn't get the audio to be coherent enough. Unfortunately in the end I couldn't spend that much time on a 'nice to have' anymore :-(

Romkabouter commented 2 years ago

Ah ok, that is probably the issue then. I have a Raspberry Pi available now, do you still want me to put some effort in it? I still have the branch.

Romkabouter commented 2 years ago

I still find this interesting, so I have installed WebThings and could now indeed install voco. Let's see if I can run it with an USB mike and a speaker and go from there :)

flatsiedatsie commented 2 years ago

Sure, that would be wonderful! If you live in Amsterdam I can supply you with a good USB mic if you want :-)

flatsiedatsie commented 2 years ago

I've uploaded the latest version of the code I was working on here: https://github.com/createcandle/voco-mini-satellite

It would be great if you could try this Arduino workflow (Arduino IDE), because if that works, then it will be possible too flash the code to user devices via the Candle Manager addon for the Webthings Gateway.

Romkabouter commented 2 years ago

Sure, that would be wonderful! If you live in Amsterdam I can supply you with a good USB mic if you want :-)

hehe, nope. Some good 200km drive north. But I got one :)

Romkabouter commented 2 years ago

I have installed WebThing and VoCo on a Pi. When I type "tell me the time", I expected to have audio output. The correct text appears. Is my expectation incorrect? I have set the output to headphone. speaker-test works

Romkabouter commented 2 years ago

ok, apparently I was expecting that incorrect. I got voco running on a Pi now and it is working :) No to see if I can get this running

Romkabouter commented 2 years ago

I thought the issue might be caused by the low energy from the M5 so I tried my matrixvoice.

I get this error:

2021-12-04 09:27:00.626 INFO   : voco: INFO:snips_hotword_lib::audio    : Audio thread for matrixvoice started
2021-12-04 09:27:00.627 INFO   : voco: INFO:snips_hotword_lib::audio    : Net and VAD thread for site matrixvoice started (vad inhibitor: true, vad messages: false
2021-12-04 09:27:00.632 INFO   : voco: ERROR:snips_hotword_lib::audio    : Error in network and VAD thread for site matrixvoice: no more audio in source

So I think it boils down to the audio again. Snips has some extra headers, it might be that this is causing that. I'll see if I can fix it

flatsiedatsie commented 2 years ago

Yeah those headers, those indeed seem to be the issue.

Glad Voco is working :-) text commands only give text output (designed for quiet operation when kids are sleeping). Voice commands give voice output.