Closed jlpouffier closed 3 months ago
Binaries are here for installing with https://web.esphome.io
@jesserockz Let's go !
when testing Arabic, it show up like this [] [] [] [] [] []
Same here when I test with Cyrillic characters
I have opened the following issue #185
The idea is coming from the Voice Assistant Contest. Credit to user Lajos and his entry.
Features added
Spoken text is displayed on the box during the thinking phase![IMG_2577](https://github.com/esphome/firmware/assets/5878296/d77ee940-4ea2-4ccb-b188-aedf62efeae0)
Response text is displayed on the box during the replying phase![IMG_2578](https://github.com/esphome/firmware/assets/5878296/3a6b6e15-fcd8-4912-8d6c-aaa94be70c6a)
This behavior is user-configurable via a switch called![CleanShot 2024-03-14 at 17 33 47](https://github.com/esphome/firmware/assets/5878296/a81c149c-e224-4c4d-a2cf-a9dfd387133e)
Display conversation
on Home Assistant.The value of the switch is restored, but
ON
by default if no value is found (It will beON
when updating for the first time)Specific changes of the firmware.
Allowed characters
In ESPHome, we need to load what character we are planning to display. Because the firmware is supposed to be used by all our supported languages, I searched for a proxy that would be a good approximation of every character that we could display. I ended up extracting all unique characters used in our test file on the intent repository of Home Assistant
This is this part of the firmware:
Because this solution is not perfect, this list is loaded as a
substitution
so that a user can still add a few missed characters in the list.2-stage thinking phase.
Interestingly enough, we are starting our thinking phase at the end of the VAD stage, in the middle of the STT phase. This is because we want to take into account the time it takes for the STT engine to fully decode the spoken command.
This means that when we start our thinking phase, the spoken text is not known, the silence has just been detected, and the processing of the last chunk of audio is still ongoing.
At first, I thought that this would be an issue, but I like it even better now.
...
are displayed instead. (Basically meaning: " I am still trying to figure out what you told me")It is visible when the STT engine is slow.![CleanShot 2024-03-14 at 17 31 22](https://github.com/esphome/firmware/assets/5878296/3349eb91-7bdf-4c8b-936d-813b82bee450)
We do not have this problem for the response, as the thinking phase extends until the streaming of the audio.