Open rpatel3001 opened 1 year ago
Thanks for this overview!
I have created a component for the touchscreen already in esphome/esphome#4793 which is working fine on my Box and is ready for review.
I am currently working on the I2C control component for the ES8311 (no PR yet, trying to figure out the best solution for MCLK).
Also, the ILI9342C driver requires some additions to allow enabling x-mirroring for the ESP32-S3-BOX. I have started implementing that, a PR will also follow. For now, you can check out my sample config linked in https://github.com/esphome/esphome/pull/4793#issuecomment-1539239430 which I will update periodically.
Awesome, nice progress. There is a very rough implementation for the ES8388 here, not sure how helpful it is as the register map is quite different and MCLK is currently hard-coded.
Also, it's worth looking into how willow and the default firmware handle MCLK, I think the codec has a mode which can derive it's LRCLK and BCLK from it's MCLK/SCLK and distribute them to the ADC on the board. That may be required if there is a requirement that MCLK is synchronous to LRCLK and BCLK? I'm not too familiar with I2S or the ESP32/esphome implementation of it.
Also, is it worth trying to get I2S support for the esp-idf framework as well, to make hacking in wake word stuff with esp-adf/esp-sr later easier? I haven't looked into this much but maybe not, since I think I saw an arduino framework wrapper for esp-adf somewhere.
Yeah the ES8311 can theoretically work without a dedicated MCLK by generating it internally from the SCLK, but as the ESP32-S3-BOX has an MCLK wired to GPIO2 anyway, we should figure out the best way to implement that in esphome, I guess. Maybe @jesserockz already has plans for that?
My ES8311 branch is at https://github.com/kroimon/esphome/tree/es8311 if you're interested, but it's still a WIP and a few days away from a proper PR.
Also, getting the whole I2S stuff working on esp-idf would be great, because we could probably integrate libraries such as WakeNet much easier. However, I could not even get a very simple esphome config to run on my S3, because it kept resetting due to some watchdog. I did not debug this any further because using esp-idf wasn't of much use without the I2S components anyway.
Adding MCLK to i2s_audio seems to me like the most straightforward path for that, is there any case where two devices might share an LRCLK and BCLK but have different MCLKs? I simply added MCLK as an optional param for i2s_audio: https://github.com/esphome/esphome/compare/dev...rpatel3001:esphome:add_i2s_mclk
Also I forked your box.yaml gist to add the RGB LED that comes with the kit, invert the sense of the settings button, and add my MCLK change and your ES8311 components
i've successfully gotten home assistant to stream TTS and radio audio to the ESP-BOX using the config in my gist. the volume is quite low, though I expect your work on the codec interface will help with that.
I probably won't have time to look into it before Friday afternoon, but that already sounds awesome!
started some ADC code at https://github.com/rpatel3001/esphome/tree/es7210
this I2S stuff make very little sense to me right now, the frequencies I measure on the pins are not at all what it looks like is configured by the i2s components. It's difficult to debug the ADC without access to the raw audio, trying to send it to the home assistant pipeline with whisper actually causes an error in whisper so it's clearly doing something wrong.
also the ADC datasheet is terrible, I could only find a register map on some sketchy chinese site by googling and it's version 2.0, compared to the most recent version 23 (without registers).
dumping some thoughts here stream of consciousness style: I think ideally i2s_audio would have options for mclk frequency and sample rate and that would be pulled into i2s_audio_media_player and i2s_audio_microphone to setup the i2s peripheral in the same way the pin numbers are currently pulled in. the DAC and ADC I2C components would need options as well to setup the chips with the correct options based on the clock settings.
it's unclear to me why i2s_audio_microphone and i2s_audio_speaker are calling esp-idf i2s functions but i2s_audio_media_player is not. The media player library is handling it internally? how do these two components work together?
The media player library is a little difficult that way. It's kind of a black box right now. We tell it what to play and it just does it. And yes, if devices need an MCLK signal, then that should be added to the i2s audio component as an optional parameter. Someone asked about that a while back, but the easier solution was to change the device setting to not require it. But he was doing the wiring, so that was easy to do.
it's unclear to me why i2s_audio_microphone and i2s_audio_speaker are calling esp-idf i2s functions but i2s_audio_media_player is not
This is because the Audio library handles the streaming, decoding and playing to i2s. It's not the best solution, but it was the easiest at the time given the timeframe I had. The weird thing is the library actually supports calling a function to give the i2s data to and not send it out, but it still requires to set up the i2s peripheral itself :facepalm:
I mean, we could probably make changes to the Audio library, as it is already a modified fork. The question is how close to upstream you want it to be. I think the main benefit of using the library in the first place are it's audio format decoders. The I2S stuff could be implemented in native esphome code to be able to better integrate different external codec chips.
I spent some more time learning the inner workings of I2S and how the different components use it right now. The following is a list of findings and 'challenges' I ran into:
The main issue we have is that there is currently no central instance that controls the parameters of the I2S bus.
The i2s_audio
platform merely acts as a container for the pin configuration, but the actual calls to i2s_driver_install()
and i2s_set_pin()
are done in i2s_audio_microphone.cpp
, i2s_audio_speaker.cpp
and i2s_audio_media_player.cpp
. In addition to that, the external ESP32-audioI2S
component calls i2s_set_sample_rates
depending on the media being played.
This makes it very hard to implement external ADCs and DACs whose configuration depend on the current clock speeds and sampling rates. Those audio codec components need a central instance to register for configuration change events so the new settings can be forwarded to the external controllers.
With the current architecture, there is also no way for full-duplex operation of the same I2S port. The Mutex
in the i2s_audio
component only allows exclusive access to an I2S port. However, the ESP32-S3-BOX and ESP32-S3-Korvo-2 boards share the same I2S port (MCLK, SCLK, LRCK pins) for both audio input and output.
In general, full-duplex operation can only work if both input and output use the same clock parameters. The microphone and speaker components currently use fixed 16000 Hz sampling rates at 16 bits per sample. The media player switches the sampling rates based on the currently played files/streams. So I don't really see a way to use a media player together with a microphone and/or speaker component for a voice assistant right now, at least not at the same time. It might be possible to implement a priority-based switching logic that allows them to coexist.
ESP-IDF 5.0 introduced the concept of 'channels' in the new i2s driver which would make full-duplex operation a somewhat easier task. (For reference, the latest currently available version of arduino-esp32 2.0.9 is based on ESP-IDF 4.4.4).
In summary, I think we need a major refactoring of the i2s_audio
platform and its microphone
, speaker
and media_player
components:
media_player
use speaker
to output audio? (Would make sense from an architecture point of view)See how many ideas are outthere for media player in ESPHome: https://github.com/esphome/feature-requests/labels/integration%3A%20media_player There's no other topic so hot imho...
@rpatel3001 I found the full datasheet for the ES7210 here (Backup). Unfortunately I was still unable to locate the corresponding user guide, but this should be enough information to get it working, together with the existing implementations in esp-bsp and esp-adf. I feel like the esp-adf implementation is even more helpful as it shows all the bits and pieces required for mic selection.
I continued a bit on your work over in my branch, mostly formatting and cleanup for now.
I made the "mistake" of trying to save some bucks and bought the ESP32-S3-Box-Lite instead of the full one. That one does not have touchscreen, but three additional buttons, and it has (apparently) an ST7789v display instead of an ILI9342C one.
For some (to me yet unexplained) reason, I can show things on the display by using the ILI9342C configuration from @kroimon (https://gist.github.com/kroimon/f6692879f9c00702990801ae9dfa433b); it just doesn't need the mirroring, but the colors are somehow offset (e.g. Red is (255, 255,0), Green is (255, 0, 255) and Blue is (0, 255, 255); while White and Black would be the expected colors). I haven't managed to show anything useful using the standard st7789 component. Does anyone have an idea why this would be?
Is it worth it, to track the S3-Box-Lite support here as well, or would it be better to create a separate Feature Request? (Since most of the components would be the same anyway).
Seems like the peripherals of the ESP32-S3-Korvo-1 are really similar to ESP32-S3-BOX as well.
One main difference is the ES7210 is on a different I2S bus from the ES8311.
I have an ESP32-S3-Korvo-1 running this config and LED ring and buttons are working, audio not working at all yet so I'm not sure I have the two I2S buses configured correctly or maybe two i2s_audio
buses aren't supported yet.
Waiting on an ESP32-S3-BOX to be able to do more testing, but the Korvo is currently in stock on Amazon for 50USD if anyone else is curious about it.
@guillempages I can add the Lite's display to the top post, but can't promise anyone will work on it as I don't have a Lite to play with. You'll probably get more visibility/help by creating a bug report for the st7789 component.
@mattkasa I think two I2S buses ought to work, but not totally sure. Does the codec work by itself if you comment out the ADC config? The current tip of the ES8311 PR sets the volume to 0, try an earlier commit or my es8311 branch for now.
@rpatel3001 I'm testing like this, but I have no idea how speaker.play:
is supposed to look:
on_press:
- output.turn_on: pa_ctrl
- speaker.play:
id: external_speaker
data: [64, 64, 0, 0, 128, 128, 0, 0, 64, 64, 0, 0, 128, 128, 0, 0, 64, 64, 0, 0, 128, 128, 0, 0, 64, 64, 0, 0, 128, 128, 0, 0, 64, 64, 0, 0, 128, 128, 0, 0]
- output.turn_off: pa_ctrl
Not getting any audible sound, but logs look like:
[02:23:19][C][es8311:167]: ES8311 Audio Codec:
[02:23:19][C][es8311:168]: Use MCLK: YES
[02:23:49][D][sensor:094]: 'button_adc': Sending state 1.63600 V with 2 decimals of accuracy
[02:23:49][D][binary_sensor:036]: 'Korvo 1 Play': Sending state ON
[02:23:49][D][esp-idf:000]: I (38072) I2S: DMA Malloc info, datalen=blocksize=4092, dma_buf_count=8
[02:23:49][D][esp-idf:000]: I (38074) I2S: I2S0, MCLK output by GPIO42
[02:23:50][D][esp-idf:000]: I (38239) I2S: DMA queue destroyed
So I wonder if it's just my speaker.play
data
:thinking:
hm, I can't say about speaker.play, I've been using home assistant to send audio to the media_player component. Do you at least get clicks when the PA is muted/unmuted? Maybe try the media player component also, the I2S code is different.
Ah yeah, I'm using the esp-idf
framework, so media_player
isn't supported, my thinking has been to use esp-idf to make it easier to build a component that uses esp-sr for wakeword since it seems like that's probably where all of this is headed :)
edit: I tried building with arduino to test with media_player
and it panics and boot loops, there is something the bootloader doesn't like, I'll keep looking at it to see if I can get it running with arduino.
I did some testing with i2s_audio_speaker and it seems to be partially working (on Arduino). With a much longer data vector (8k samples = half a second, a full second crashed the board when played) I mostly just hear clicks but occasionally the tone plays for a fraction of the duration. Interestingly the tone is twice the frequency it should be, which is maybe a clue about what's wrong.
I also tried compiling a barebones config for esp-idf but it bootloops. Fixed the bootloop with
platformio_options:
board_build.flash_mode: dio
but then it just hangs after booting. Haven't found a fix for that, it does this even with the most recent esp-idf version/platform_version.
@rpatel3001 for esp-idf try:
esp32:
board: esp32s3box
framework:
type: esp-idf
variant: ESP32S3
I was able to get arduino working on the Korvo with this:
esp32:
board: esp32-s3-devkitc-1
variant: esp32s3
framework:
type: arduino
And media_player
tries to work, but no sound, not even clicks, so I don't have something right with the I2S bus.
[05:36:40][D][media_player:059]: 'Korvo 1 Media Player' - Setting
[05:36:40][D][media_player:066]: Media URL: https://homeassistant.local/api/tts_proxy/726c76553e1a3fdea29134f36e6af2ea05ec5cce_en-us_a877e2b3bf_tts.piper.wav
Adding the variant and/or changing the board didn't change anything unfortunately.
The LilyGo T-Embed has the ES7210 as well, so this will be great for making tiny assistants :)
@rpatel3001 I managed to get the colors working (more-or-less) on the ESPBox Lite, by hacking the code in the ILI9xxx display to force BGR byte order and invert display. If I get some time I'll try making a PR on ESPHome so that this can be configured in the yaml file, and then the ESPBox Liste display could be set to done :-)
@rpatel3001 I've created two PRs to be able to use the displays out of the box: https://github.com/esphome/esphome/pull/4941 (for the Box-Lite) https://github.com/esphome/esphome/pull/4942 (for the Box)
Since I do not have a "full" Box; could you try using the ili9xxx display from the 4942 PR and see if the mirroring and colors work without the workaround?
sweet, I tried it out and it works. checklist updated.
I've been away from this for a while and will be for another week or so but I modified the i2s microphone to stream samples to matlab. The data clearly has some relationship to the actual audio in the environment, tested by FFTing/playing/plotting the streamed samples with silence and with test tones playing, but the sample rate doesn't match, it's quite noisy, and for many captures every other sample is +/- 32767 or has some DC offset, so there're several issues. I can drop the patch and matlab script here if anyone else is interested, but I won't be working on it for a little bit.
Any updates on the speaker? Is this config correct? I don't hear any sound, not even clicks. Framework is arduino.
i2s_audio:
i2s_lrclk_pin: GPIO47
i2s_bclk_pin: GPIO17
i2s_mclk_pin: GPIO2
media_player:
- platform: i2s_audio
name: Speaker
dac_type: external
i2s_dout_pin: GPIO15
You're missing
mute_pin:
number: GPIO46
inverted: true
mode: mono
Now I get static when I play something. I don't hear any clicks, and when I'm not playing something there's no sound.
Do you also have:
external_components:
- source: github://pr#4861
components: [ es8311 ]
es8311:
address: 0x18
Added that, and now it can successfully buzz and click.
After reviewing a number of threads and PRs, I have the following things working on my box:
I'm running into issues with the microphone and speakers, though. At this point, I'm not really sure how to test the mic. I tried to set it up with the voice assistant, but I don't have a good way to activate that right now.
I used some of the information above to try to get the speaker working. Based on some of the tasks left to provide support, I have a feeling that I'm probably trying to do too much at once in terms of having the mic, speaker, and media_player going all at once, given what's supported at this point.
When I open up the speaker and use a TTS like pico/piper to try to get it to produce sound, I do hear a popping noise.
Here's my current config:
esphome:
name: box
friendly_name: Box
esp32:
board: esp32s3box
framework:
type: arduino
external_components:
- source: github://pr#4793
components: [ tt21100 ]
- source: github://pr#4861
components: [ es8311 ]
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Box Fallback Hotspot"
password: "<removed>"
# Enable Home Assistant API
api:
encryption:
key: "<removed>"
ota:
password: "<removed>"
# Enable logging
logger:
time:
- platform: sntp
id: time_sntp
#time:
# - platform: homeassistant
# id: time_ha
output:
- platform: ledc
pin: GPIO45
id: lcd_backlight
- platform: gpio
pin: GPIO46
id: ns4150_ctrl
light:
- platform: monochromatic
output: lcd_backlight
name: "LCD Backlight"
restore_mode: ALWAYS_ON
spi:
clk_pin: GPIO7
mosi_pin: GPIO6
display:
- platform: ili9xxx
model: S3BOX
cs_pin: GPIO5
dc_pin: GPIO4
reset_pin: GPIO48
id: lcd
# Width = 320, Height = 240
lambda: |-
it.fill(Color::WHITE);
auto red = Color(255, 0, 0);
auto green = Color(0, 255, 0);
auto blue = Color(0, 0, 255);
it.filled_rectangle(10, 170, 60, 60, red);
it.filled_rectangle(130, 170, 60, 60, green);
it.filled_rectangle(250, 170, 60, 60, blue);
it.strftime(160, 85, id(font_time), Color::BLACK, TextAlign::CENTER, "%H:%M", id(time_sntp).now());
if (id(muted).state) {
it.print(310, 10, id(font_small), red, TextAlign::TOP_RIGHT, "Muted");
}
font:
- file: "gfonts://Roboto@900"
id: font_time
size: 80
glyphs: "0123456789:"
- file: "gfonts://Roboto"
id: font_small
size: 20
i2c:
scl: GPIO18
sda: GPIO8
scan: true
touchscreen:
- platform: tt21100
address: 0x24
interrupt_pin: GPIO3
# Don't use as the reset pin is shared with the display, so the ili9xxx will perform the reset
#reset_pin: GPIO48
binary_sensor:
- platform: gpio
pin:
number: GPIO0
mode: INPUT_PULLUP
id: settings
name: "Settings"
- platform: gpio
pin:
number: GPIO1
inverted: true
id: muted
name: "Muted"
- platform: tt21100
name: "Home"
index: 0
- platform: touchscreen
name: "Red"
x_min: 10
x_max: 70
y_min: 170
y_max: 230
- platform: touchscreen
name: "Green"
x_min: 130
x_max: 190
y_min: 170
y_max: 230
- platform: touchscreen
name: "Blue"
x_min: 250
x_max: 310
y_min: 170
y_max: 230
i2s_audio:
i2s_lrclk_pin: GPIO47
i2s_bclk_pin: GPIO17
i2s_mclk_pin: GPIO2
es8311:
address: 0x18
bluetooth_proxy:
voice_assistant:
microphone: mic
speaker: audio
button:
- platform: restart
name: "Restart Device"
text_sensor:
- platform: wifi_info
ip_address:
name: IP Address
ssid:
name: Connected SSID
bssid:
name: Connected BSSID
mac_address:
name: Mac Wifi Address
scan_results:
name: Latest Scan Results
sensor:
- platform: wifi_signal
name: "WiFi Signal Sensor"
update_interval: 60s
- platform: wifi_signal
name: "WiFi Signal dB"
id: wifi_signal_db
update_interval: 60s
entity_category: "diagnostic"
microphone:
- platform: i2s_audio
id: mic
adc_type: external
pdm: false
i2s_din_pin: GPIO16
speaker:
- platform: i2s_audio
id: audio
dac_type: external
i2s_dout_pin: GPIO15
mode: mono
media_player:
- platform: i2s_audio
name: Speaker
dac_type: external
i2s_dout_pin: GPIO15
mute_pin:
number: GPIO46
inverted: true
mode: mono
# i2c device at address 0x18 - ES8311 Audio Codec
# i2c device at address 0x24 - TT21100 Touchscreen
# i2c device at address 0x40 - ES7210 Mic ADC
# i2c device at address 0x68 - ICM-42607-P IMU
captive_portal:
Does anyone have any ideas?
Speaker and Media Player might be mutually exclusive. I have on_press actions for the settings button to start and stop the mic, you can also start and stop the voice assistant that way. Try media player without speaker and see if you can play piper or web radio from home assistant.
I think maybe the ADC has been working ok this whole time? Using this config to stream samples to matlab, the audio comes through fine. Whisper is still failing though, with the same sort of error:
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-50' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:26> exception=ValueError("can't extend empty axis 0 using modes other than 'constant' or 'empty'")>
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 32, in run
if not (await self.handle_event(event)):
File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 61, in handle_event
segments, _info = self.model.transcribe(
File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 124, in transcribe
features = self.feature_extractor(audio)
File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 152, in __call__
frames = self.fram_wave(waveform)
File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 98, in fram_wave
frame = np.pad(frame, pad_width=padd_width, mode="reflect")
File "<__array_function__ internals>", line 200, in pad
File "/usr/local/lib/python3.9/dist-packages/numpy/lib/arraypad.py", line 815, in pad
raise ValueError(
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Seems like the audio format is not quite what's expected by whisper.
Matlab script:
clear;
t = tcpserver(6666);
z=zeros(32000,1);
while 1
if t.NumBytesAvailable > 5*1600
x = char(read(t, 5*1600));
y = textscan(x,'%xs16','Delimiter',',');
a = double(y{1})/2^15;
z = [z(1601:end); a];
plot(z);
drawnow;
sound(a,16000);
end
end
Thanks for the suggestions. This has been my first attempt at customizing anything with ESPHome.
I didn't realize that voice_assistant
could use speaker
or media_player
earlier, which is why I was concerned about removing the speaker
earlier today. Reviewing the docs helped with that: https://esphome.io/components/voice_assistant.html
So I removed speaker
and updated voice_assistant
to this:
voice_assistant:
microphone: mic
media_player: audio
Then I added id: audio
to my media_player
config.
Next, I updated my settings button like you recommended:
- platform: gpio
pin:
number: GPIO0
mode: INPUT_PULLUP
id: settings
name: "Settings"
on_press:
- if:
condition: voice_assistant.is_running
then:
- voice_assistant.stop:
else:
- voice_assistant.start_continuous:
Everything compiled without issues, and the settings button did activate the voice_assist
pipeline, but it just timed out each time for a few tries; [E][voice_assistant:231]: Error: pipeline-timeout - Pipeline timeout
I also tried piper with the media_player
, and that gave me the same popping noise.
For the mic, I remembered what you said about turning the microphone on/off with your settings button, so I tried this out based on https://esphome.io/components/microphone/index.html:
- platform: gpio
pin:
number: GPIO0
mode: INPUT_PULLUP
id: settings
name: "Settings"
on_press:
- if:
condition: voice_assistant.is_running
then:
- microphone.stop_capture:
- voice_assistant.stop:
else:
- microphone.capture:
- voice_assistant.start_continuous:
That didn't help, though, and I started getting things in the logs that I hadn't seen before, like ERROR Serial port closed!
and this:
[18:19:43][03mD[iaysno06:'eig: SndigsaeOF[m
[18:19:43]\0330;6[]bnr_esr06:'etns:Sedn tt 0
[18:19:43][D[ie_assat14] Sinln tp.[m
[18:19:46][03mD[iar_eso:3] 'etn' enigsateOF[m
[18:19:46][0;6[]baysno:3] Stig' edn tt N[m
[18:19:46]D[oc_sitn:3] eusigsat.\0330
[18:19:46]\0330;6D[oc_sitn:1] trtin..[m
[18:20:02]\03313mE[oc_sitn:3] ro:ppelietmot-Ppln ieu\0330[D][ocassat14:Sgan tp.\0330[03mD[iaysno:3] Stig' dn tt F\0330\0330;6[]bnr_esr06:'etns:SnigsaeO\033m
[18:20:02]]vieassat12:Rqetn tr..0\0330;3mD[oc_sitn:1] tri..[m
[18:20:02]]vieassat14AsitPpln ung[m
I'm going to remove the microphone capture/stop capture as a next step, since that didn't seem to help.
Any other ideas on the mic and speaker?
@rpatel3001, I just saw your post. I'm taking a look at that and your config.
For me, I was thinking about adding this to microphone
to at least see if anything is coming through via the log:
on_data:
- logger.log:
format: "Received %d bytes"
args: ['x.size()']
For the speaker, I hadn't heard of web radio, but I found a radio browser integration that I just added.
So the speaker component can't play anything except what the voice_assistant sends back, web radio and TTS will only work with media_player.
I don't know if voice_assistant will work with media_player, my understanding was that it needs the speaker component.
Also, you'll see in my config that I have the button activating the voice assistant only whole pressed, and on_release ends the capture. Probably will avoid timing out that way.
The current state of this is that everything individually seems to be working, but something about the microphone and es7210 is not configured to pass data in the way whisper expects for STT, so voice_assistant doesn't actually do anything yet.
Thanks again!
So the speaker component can't play anything except what the voice_assistant sends back, web radio and TTS will only work with media_player.
Got it. I'm just not sure why I can't get the tts to work now that I'm using media_player
. I'll try that radio integration I found.
I don't know if voice_assistant will work with media_player, my understanding was that it needs the speaker component.
It looks like it should based on the docs: media_player (Optional, ID): The media_player to use to output the response. Cannot be used with speaker above.
Also, you'll see in my config that I have the button activating the voice assistant only whole pressed, and on_release ends the capture. Probably will avoid timing out that way.
Yeah, that's probably a good idea.
I also noticed that you're doing a few other things in your config that are different:
- source: github://rpatel3001/esphome@es7210
components: [ es7210 ]
- source: github://rpatel3001/esphome@mictest
components: [ i2s_audio ]
...
es7210:
address: 0x40
I haven't take a look at those on your GitHub yet, though.
the es7210 component is needed to setup the ADC chip over SPI, same as the es8311 component does for the DAC.
The media_player option for voice_assistant must be new, that's very cool. My mictest branch of i2s_audio just adds streaming samples over TCP, you don't need that unless you want to analyze raw ADC samples, with the above matlab script or another tool.
I'm a bit confused. Is it possible to get the speaker to make noise beyond clicking yet?
@KTibow yes you should be able to play media from home assistant if you use the media_player component.
I can play media, but it just makes popping sounds.
external_components: - source: "github://pr#4793" components: [ tt21100 ] - source: "github://pr#4861" components: [ es8311 ] i2c: scl: GPIO18 sda: GPIO8 i2s_audio: i2s_lrclk_pin: GPIO47 i2s_bclk_pin: GPIO17 i2s_mclk_pin: GPIO2 es8311: address: 0x18 media_player: - platform: i2s_audio name: Speaker dac_type: external i2s_dout_pin: GPIO15 mute_pin: number: GPIO46 inverted: true
I believe that config should do just fine. My config is linked here and I can play web radio and TTS.
After changing some stuff to more closely match your config, the speaker works!
Back to getting voice commands working, I inserted some prints into the whisper container's python and it seems like it's receiving audio-start and audio-stop events from wyoming but no audio-chunk events, so no audio samples making it from esphome to whisper. This seems most likely to be a problem in home assistant or the voice_assistant component in esphome.
edit: async_process_audio_stream in homeassistant/components/wyoming/stt.py is receiving a stream object that has no contents, but the metadata seems correct. Tracking down where this comes from and if this is an issue with HA or esphome.
I'm not sure what to do about the speaker. I keep getting the popping noise but nothing else. I completely replaced my configuration with this, other than putting in my own ap/ota/api passwords/key: https://gist.github.com/rpatel3001/ffd160577b96585fda144b786d789f46
That includes removing mode: mono
.
In the logs, I'm seeing this:
[00:44:26][D][media_player:066]: Media URL: http://home-assistant.local:8123/api/tts_proxy/32b7cbdc35a6c367b425528d61d48e8570a81c95_en-us_22597d2fbc_tts.piper.wav
[00:44:26][727290][V][ssl_client.cpp:324] stop_ssl_socket(): Cleaning SSL connection.
[00:44:27][728202][E][WiFiClient.cpp:268] connect(): socket error on fd 52, errno: 104, "Connection reset by peer"
[00:44:27][728375][V][ssl_client.cpp:324] stop_ssl_socket(): Cleaning SSL connection.
[00:44:27][728424][V][ssl_client.cpp:324] stop_ssl_socket(): Cleaning SSL connection.
It seems like it might be related to this: https://github.com/esphome/issues/issues/4088
The radio isn't working for me, either:
[00:46:00][821533][V][ssl_client.cpp:324] stop_ssl_socket(): Cleaning SSL connection.
[00:46:01][821796][V][ssl_client.cpp:324] stop_ssl_socket(): Cleaning SSL connection.
[00:46:01][822125][V][ssl_client.cpp:62] start_ssl_client(): Free internal heap before TLS 233112
[00:46:01][822125][V][ssl_client.cpp:68] start_ssl_client(): Starting socket
[00:46:01][822132][V][ssl_client.cpp:149] start_ssl_client(): Seeding the random number generator
[00:46:01][822137][V][ssl_client.cpp:158] start_ssl_client(): Setting up the SSL/TLS structure...
[00:46:01][822143][D][ssl_client.cpp:179] start_ssl_client(): WARNING: Skipping SSL Verification. INSECURE!
[00:46:01][822152][V][ssl_client.cpp:257] start_ssl_client(): Setting hostname for TLS session...
[00:46:01][822159][V][ssl_client.cpp:272] start_ssl_client(): Performing the SSL/TLS handshake...
[00:46:01][822532][V][ssl_client.cpp:293] start_ssl_client(): Verifying peer X.509 certificate...
[00:46:01][822533][V][ssl_client.cpp:301] start_ssl_client(): Certificate verified.
[00:46:01][822536][V][ssl_client.cpp:316] start_ssl_client(): Free internal heap after TLS 195720
[00:46:01][822543][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 175 bytes...
[00:46:01][822570][V][ssl_client.cpp:324] stop_ssl_socket(): Cleaning SSL connection.
[00:46:01][822637][V][ssl_client.cpp:324] stop_ssl_socket(): Cleaning SSL connection.
This might be related to that: https://github.com/esphome/issues/issues/4369
Does ESPHome require HTTPS for media/tts?
Describe the problem you have/What new integration you would like
Main features: support for peripherals on the ESP32-S3-BOX dev kit:
To get voice_assistant working:
Architectural changes to support wakeword and esp-idf framework (probably out of scope here and will be transferred to a new issue or 3 once the S3-BOX works for on-demand voice commands):
Please describe your use case for this integration and alternatives you've tried:
Use the peripherals on the board. Working on-demand voice_assistant.
Additional context
This device has recently had a bit of attention due to posts about Willow on hackernews and elsewhere. Willow is fantastic but I'd like to be able to use the full extent of existing esphome components, and I bet others would also. Adding hardware peripherals is the smallest part of this, wake word detection is the major missing feature missing to make esphome a viable alternative (out of scope for this feature request though).
Reference links: https://github.com/espressif/esp-box https://github.com/toverainc/willow https://github.com/hugobloem/esp-ha-speech https://github.com/espressif/esp-dev-kits/issues/24#issuecomment-781314125 https://components.espressif.com/components/espressif/es8311 https://components.espressif.com/components/espressif/es7210 https://github.com/espressif/esp-bsp/ https://github.com/espressif/esp-adf/