Open gnumpi opened 2 months ago
I committed a version which makes sure to release the i2s controller after stopping the pipeline.
This should enable support for DACs and ADCs which share lrclk and bclk pins.
When the media player is started outside the wake_word detection or voice assistant pipeline, it is not aware of a running voice assistant loop right now.
So please make sure to stop the voice assistant before trying to stream media like radio stations.
For testing please use the following branch:
- source:
type: git
url: https://github.com/gnumpi/esphome_audio
ref: 17-add-support-for-i2s-duplex-mode
components: [ adf_pipeline, i2s_audio ]
refresh: 0s
When I run this version, I can't get the mic or media player to work. Below is my config file for ESP Home:
substitutions:
name: "living-room-onju-home"
friendly_name: "Living Room Onju Home"
external_components:
- source:
type: git
url: https://github.com/gnumpi/esphome_audio
ref: 17-add-support-for-i2s-duplex-mode
components: [ adf_pipeline, i2s_audio ]
esphome:
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
min_version: 2024.2.0
platformio_options:
build_flags: "-DBOARD_HAS_PSRAM"
board_build.arduino.memory_type: qio_opi
board_build.flash_mode: dio
on_boot:
then:
- light.turn_on:
id: top_led
effect: slow_pulse
red: 100%
green: 60%
blue: 0%
- wait_until:
condition:
wifi.connected:
- light.turn_on:
id: top_led
effect: pulse
red: 0%
green: 100%
blue: 0%
- wait_until:
condition:
api.connected:
- light.turn_on:
id: top_led
effect: none
red: 0%
green: 100%
blue: 0%
- delay: 1s
- script.execute: reset_led
esp32:
board: esp32-s3-devkitc-1
framework:
type: esp-idf
version: recommended
sdkconfig_options:
# need to set a s3 compatible board for the adf-sdk to compile
# board specific code is not used though
CONFIG_ESP32_S3_BOX_BOARD: "y"
psram:
mode: octal
speed: 80MHz
logger:
api:
encryption:
key: "<REDACTED>"
services:
- service: start_va
then:
- voice_assistant.start
- service: stop_va
then:
- voice_assistant.stop
ota:
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
ap:
password: "${wifi_ap_password}"
globals:
- id: thresh_percent
type: float
initial_value: "0.03"
restore_value: false
- id: touch_calibration_values_left
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_center
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_right
type: uint32_t[5]
restore_value: false
interval:
- interval: 1s
then:
- script.execute:
id: calibrate_touch
button: 0
- script.execute:
id: calibrate_touch
button: 1
- script.execute:
id: calibrate_touch
button: 2
#i2s_audio:
# - i2s_lrclk_pin: GPIO13
# i2s_bclk_pin: GPIO18
#speaker:
# - platform: i2s_audio
# id: onju_out
# dac_type: external
# i2s_dout_pin: GPIO12
# mode: stereo
#microphone:
# - platform: i2s_audio
# id: onju_microphone
# i2s_din_pin: GPIO17
# adc_type: external
# pdm: false
i2s_audio:
- id: i2s_all_audio
i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
adf_pipeline:
- platform: i2s_audio
type: sink
id: adf_i2s_out
i2s_audio_id: i2s_all_audio
i2s_dout_pin: GPIO12
- platform: i2s_audio
type: source
id: adf_i2s_in
i2s_audio_id: i2s_all_audio
i2s_din_pin: GPIO17
channel: right
sample_rate: 16000
bits_per_sample: 16bit
microphone:
- platform: adf_pipeline
id: adf_microphone
pipeline:
- adf_i2s_in
- self
media_player:
- platform: adf_pipeline
id: adf_media_player
name: None
internal: false
pipeline:
- self
- adf_i2s_out
micro_wake_word:
model: hey_jarvis
on_wake_word_detected:
then:
- voice_assistant.start
voice_assistant:
id: va
microphone: adf_microphone
media_player: adf_media_player
use_wake_word: false
noise_suppression_level: 4
auto_gain: 31dBFS
volume_multiplier: 8.0
on_listening:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 100%
brightness: 100%
effect: listening
on_stt_vad_end:
- light.turn_on:
id: top_led
blue: 100%
red: 0%
green: 20%
brightness: 70%
effect: processing
on_tts_end:
- light.turn_on:
id: top_led
blue: 0%
red: 20%
green: 100%
effect: speaking
on_end:
- delay: 500ms
- wait_until:
not:
media_player.is_playing: adf_media_player
- script.execute: reset_led
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- delay: 200ms
- micro_wake_word.start
on_client_connected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- micro_wake_word.start:
on_client_disconnected:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- voice_assistant.stop:
- micro_wake_word.stop:
on_error:
- light.turn_on:
id: top_led
blue: 0%
red: 100%
green: 0%
effect: none
- delay: 1s
- script.execute: reset_led
number:
- platform: template
name: "Touch threshold percentage"
id: touch_threshold_percentage
update_interval: never
entity_category: config
initial_value: 1.25
min_value: -1
max_value: 5
step: 0.25
optimistic: true
on_value:
then:
- lambda: !lambda |-
id(thresh_percent) = 0.01 * x;
esp32_touch:
setup_mode: false
sleep_duration: 2ms
measurement_duration: 800us
low_voltage_reference: 0.8V
high_voltage_reference: 2.4V
filter_mode: IIR_16
debounce_count: 2
noise_threshold: 0
jitter_step: 0
smooth_mode: IIR_2
denoise_grade: BIT8
denoise_cap_level: L0
binary_sensor:
- platform: esp32_touch
id: volume_down
pin: GPIO4
threshold: 539000 # 533156-551132
on_press:
then:
- light.turn_on: left_led
- script.execute:
id: set_volume
volume: -0.05
- delay: 1s
- while:
condition:
binary_sensor.is_on: volume_down
then:
- script.execute:
id: set_volume
volume: -0.05
- delay: 150ms
on_release:
then:
- light.turn_off: left_led
- platform: esp32_touch
id: volume_up
pin: GPIO2
threshold: 580000 # 575735-593064
on_press:
then:
- light.turn_on: right_led
- script.execute:
id: set_volume
volume: 0.05
- delay: 1s
- while:
condition:
binary_sensor.is_on: volume_up
then:
- script.execute:
id: set_volume
volume: 0.05
- delay: 150ms
on_release:
then:
- light.turn_off: right_led
- platform: esp32_touch
id: action
pin: GPIO3
threshold: 751000 # 745618-767100
on_click:
- if:
condition:
or:
- switch.is_off: use_wake_word
- binary_sensor.is_on: mute_switch
then:
- logger.log:
tag: "action_click"
format: "Voice assistant is running: %s"
args: ['id(va).is_running() ? "yes" : "no"']
- if:
condition: media_player.is_playing
then:
- media_player.stop
- if:
condition: voice_assistant.is_running
then:
- voice_assistant.stop:
else:
- voice_assistant.start:
else:
- logger.log:
tag: "action_click"
format: "Voice assistant was running with wake word detection enabled. Starting continuously"
- if:
condition: media_player.is_playing
then:
- media_player.stop
- voice_assistant.stop
- delay: 1s
- script.execute: reset_led
- script.wait: reset_led
- voice_assistant.start_continuous:
- platform: gpio
id: mute_switch
pin:
number: GPIO38
mode: INPUT_PULLUP
name: Disable wake word
on_press:
- script.execute: turn_off_wake_word
on_release:
- script.execute: turn_on_wake_word
light:
- platform: esp32_rmt_led_strip
id: leds
pin: GPIO11
chipset: SK6812
num_leds: 6
rgb_order: grb
rmt_channel: 0
default_transition_length: 0s
gamma_correct: 2.8
- platform: partition
id: left_led
segments:
- id: leds
from: 0
to: 0
default_transition_length: 100ms
- platform: partition
id: top_led
segments:
- id: leds
from: 1
to: 4
default_transition_length: 100ms
effects:
- pulse:
name: pulse
transition_length: 250ms
update_interval: 250ms
- pulse:
name: slow_pulse
transition_length: 1s
update_interval: 2s
- addressable_twinkle:
name: listening_ww
twinkle_probability: 1%
- addressable_twinkle:
name: listening
twinkle_probability: 45%
- addressable_scan:
name: processing
move_interval: 80ms
- addressable_flicker:
name: speaking
intensity: 35%
- platform: partition
id: right_led
segments:
- id: leds
from: 5
to: 5
default_transition_length: 100ms
script:
- id: reset_led
then:
- if:
condition:
and:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
then:
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 0%
brightness: 60%
effect: listening_ww
else:
- light.turn_off: top_led
- id: set_volume
mode: restart
parameters:
volume: float
then:
- light.turn_on:
id: top_led
effect: show_volume
- delay: 1s
- script.execute: reset_led
- id: turn_on_wake_word
then:
- if:
condition:
and:
- binary_sensor.is_off: mute_switch
- switch.is_on: use_wake_word
then:
- micro_wake_word.start
- if:
condition:
media_player.is_playing:
then:
- media_player.stop:
- script.execute: reset_led
else:
- logger.log:
tag: "turn_on_wake_word"
format: "Trying to start listening for wake word, but %s"
args:
[
'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
]
level: "INFO"
- id: turn_off_wake_word
then:
- micro_wake_word.stop
- script.execute: reset_led
- id: calibrate_touch
parameters:
button: int
then:
- lambda: |-
static uint8_t thresh_indices[3] = {0, 0, 0};
static uint32_t sums[3] = {0, 0, 0};
static uint8_t qsizes[3] = {0, 0, 0};
static uint16_t consecutive_anomalies_per_button[3] = {0, 0, 0};
uint32_t newval;
uint32_t* calibration_values;
switch(button) {
case 0:
newval = id(volume_down).get_value();
calibration_values = id(touch_calibration_values_left);
break;
case 1:
newval = id(action).get_value();
calibration_values = id(touch_calibration_values_center);
break;
case 2:
newval = id(volume_up).get_value();
calibration_values = id(touch_calibration_values_right);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
if(newval == 0) return;
//ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
//ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
if(qsizes[button] == 5) {
float avg = float(sums[button])/float(qsizes[button]);
if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
consecutive_anomalies_per_button[button]++;
//ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
if(consecutive_anomalies_per_button[button] < 10)
return;
}
}
//ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
consecutive_anomalies_per_button[button] = 0;
if(qsizes[button] == 5) {
//ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
qsizes[button]--;
}
*(calibration_values+thresh_indices[button]) = newval;
sums[button] += newval;
qsizes[button]++;
thresh_indices[button] = (thresh_indices[button] + 1) % 5;
//ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
//ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
switch(button) {
case 0:
id(volume_down).set_threshold(newthresh);
break;
case 1:
id(action).set_threshold(newthresh);
break;
case 2:
id(volume_up).set_threshold(newthresh);
break;
default:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
return;
}
switch:
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
on_turn_on:
- script.execute: turn_on_wake_word
on_turn_off:
- script.execute: turn_off_wake_word
- platform: gpio
id: dac_mute
restore_mode: ALWAYS_ON
pin:
number: GPIO21
inverted: True
Here is the log when I try to play audio:
INFO ESPHome 2024.2.1
INFO Reading configuration /config/esphome/living-room-onju-home.yaml...
INFO Starting log output from <REDACTED> using esphome API
INFO Successfully connected to living-room-onju-home @ <REDACTED> in 0.021s
INFO Successful handshake with living-room-onju-home @ <REDACTED> in 0.085s
[15:00:12][I][app:102]: ESPHome version 2024.2.1 compiled on Mar 5 2024, 14:52:53
[15:00:12][C][wifi:577]: WiFi:
[15:00:12][C][wifi:409]: Local MAC: <REDACTED>
[15:00:12][C][wifi:414]: SSID: 'HASS'[redacted]
[15:00:12][C][wifi:415]: IP Address: <REDACTED>
[15:00:12][C][wifi:417]: BSSID: [redacted]
[15:00:12][C][wifi:418]: Hostname: 'living-room-onju-home'
[15:00:12][C][wifi:420]: Signal strength: -43 dB ▂▄▆█
[15:00:12][C][wifi:424]: Channel: 1
[15:00:12][C][wifi:425]: Subnet: <REDACTED>
[15:00:12][C][wifi:426]: Gateway: <REDACTED>
[15:00:12][C][wifi:427]: DNS1: <REDACTED>
[15:00:12][C][wifi:428]: DNS2: <REDACTED>
[15:00:12][C][logger:447]: Logger:
[15:00:12][C][logger:448]: Level: DEBUG
[15:00:12][C][logger:449]: Log Baud Rate: 115200
[15:00:12][C][logger:451]: Hardware UART: USB_SERIAL_JTAG
[15:00:12][C][template.number:050]: Template Number 'Touch threshold percentage'
[15:00:12][C][template.number:051]: Optimistic: YES
[15:00:12][C][template.number:052]: Update Interval: never
[15:00:12][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[15:00:12][C][esp32_rmt_led_strip:176]: Pin: 11
[15:00:12][C][esp32_rmt_led_strip:177]: Channel: 0
[15:00:12][C][esp32_rmt_led_strip:202]: RGB Order: GRB
[15:00:12][C][esp32_rmt_led_strip:203]: Max refresh rate: 0
[15:00:12][C][esp32_rmt_led_strip:204]: Number of LEDs: 6
[15:00:12][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[15:00:12][C][switch.gpio:091]: Restore Mode: always ON
[15:00:12][C][switch.gpio:031]: Pin: GPIO21
[15:00:12][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[15:00:12][C][gpio.binary_sensor:016]: Pin: GPIO38
[15:00:12][C][light:103]: Light 'leds'
[15:00:12][C][light:105]: Default Transition Length: 0.0s
[15:00:12][C][light:106]: Gamma Correct: 2.80
[15:00:12][C][light:103]: Light 'left_led'
[15:00:12][C][light:105]: Default Transition Length: 0.1s
[15:00:12][C][light:106]: Gamma Correct: 2.80
[15:00:12][C][light:103]: Light 'top_led'
[15:00:12][C][light:105]: Default Transition Length: 0.1s
[15:00:12][C][light:106]: Gamma Correct: 2.80
[15:00:12][C][light:103]: Light 'right_led'
[15:00:12][C][light:105]: Default Transition Length: 0.1s
[15:00:12][C][light:106]: Gamma Correct: 2.80
[15:00:12][C][template.switch:068]: Template Switch 'Use Wake Word'
[15:00:12][C][template.switch:091]: Restore Mode: restore defaults to ON
[15:00:12][C][template.switch:057]: Optimistic: YES
[15:00:12][C][psram:020]: PSRAM:
[15:00:12][C][psram:021]: Available: YES
[15:00:12][C][psram:024]: Size: 8191 KB
[15:00:12][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[15:00:12][C][esp32_touch:074]: Meas cycle: 0.80ms
[15:00:12][C][esp32_touch:075]: Sleep cycle: 2.00ms
[15:00:12][C][esp32_touch:095]: Low Voltage Reference: 0.8V
[15:00:12][C][esp32_touch:115]: High Voltage Reference: 2.4V
[15:00:12][C][esp32_touch:135]: Voltage Attenuation: 0V
[15:00:12][C][esp32_touch:169]: Filter mode: IIR_16
[15:00:12][C][esp32_touch:170]: Debounce count: 2
[15:00:12][C][esp32_touch:171]: Noise threshold coefficient: 0
[15:00:12][C][esp32_touch:172]: Jitter filter step size: 0
[15:00:12][C][esp32_touch:191]: Smooth level: IIR_2
[15:00:12][C][esp32_touch:213]: Denoise grade: BIT8
[15:00:12][C][esp32_touch:245]: Denoise capacitance level: L0
[15:00:12][C][esp32_touch:260]: Touch Pad 'volume_down'
[15:00:12][C][esp32_touch:261]: Pad: T4
[15:00:12][C][esp32_touch:262]: Threshold: 582598
[15:00:12][C][esp32_touch:260]: Touch Pad 'volume_up'
[15:00:12][C][esp32_touch:261]: Pad: T2
[15:00:12][C][esp32_touch:262]: Threshold: 586502
[15:00:12][C][esp32_touch:260]: Touch Pad 'action'
[15:00:12][C][esp32_touch:261]: Pad: T3
[15:00:12][C][esp32_touch:262]: Threshold: 775188
[15:00:12][C][mdns:115]: mDNS:
[15:00:12][C][mdns:116]: Hostname: living-room-onju-home
[15:00:13][C][ota:096]: Over-The-Air Updates:
[15:00:13][C][ota:097]: Address: living-room-onju-home.local:3232
[15:00:13][C][ota:103]: OTA version: 2.
[15:00:13][C][api:139]: API Server:
[15:00:13][C][api:140]: Address: living-room-onju-home.local:6053
[15:00:13][C][api:142]: Using noise encryption: YES
[15:00:13][C][micro_wake_word:057]: microWakeWord:
[15:00:13][C][micro_wake_word:058]: Wake Word: hey jarvis
[15:00:13][C][micro_wake_word:059]: Probability cutoff: 0.500
[15:00:13][C][micro_wake_word:060]: Sliding window size: 10
[15:00:13][C][adf_audio:016]: ESP-ADF-MediaPlayer:
[15:00:13][C][adf_audio:018]: Number of ASPComponents: 2
[15:00:17][D][media_player:059]: 'Living Room Onju Home' - Setting
[15:00:17][D][media_player:066]: Media URL: https://<REDACTED>/api/tts_proxy/a54d88e06612d820bc3be72877c74f257b561b19_en-gb_8cd8d30e6e_tts.piper.mp3
[15:00:17][D][esp_adf_pipeline:038]: Init request, current state UNAVAILABLE
[15:00:17][D][esp-idf:000]: I (394148) MP3_DECODER: MP3 init
[15:00:17][D][esp_adf_pipeline:233]: Adding new component
[15:00:17][D][esp_adf_pipeline:235]: Adding element of component
[15:00:17][D][esp_adf_pipeline:235]: Adding element of component
[15:00:17][D][esp-idf:000]: I (394160) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8
[15:00:17][D][esp-idf:000]: I (394163) I2S: I2S0, MCLK output by GPIO2
[15:00:17][D][esp-idf:000]: I (394165) ESP32_S3_BOX: I2S0, MCLK output by GPIO0
[15:00:17][D][esp_adf_pipeline:233]: Adding new component
[15:00:17][D][esp_adf_pipeline:235]: Adding element of component
[15:00:17][D][esp_adf_pipeline:249]: pipeline tag 0, http
[15:00:17][D][esp_adf_pipeline:249]: pipeline tag 1, decoder
[15:00:17][D][esp_adf_pipeline:249]: pipeline tag 2, i2s_out
[15:00:17][D][esp-idf:000]: I (394179) AUDIO_PIPELINE: link el->rb, el:0x3d8203b0, tag:http, rb:0x3d8208cc
[15:00:17][D][esp-idf:000]: I (394183) AUDIO_PIPELINE: link el->rb, el:0x3d820568, tag:decoder, rb:0x3d82190c
[15:00:17][D][esp_adf_pipeline:262]: Setting up event listener.
[15:00:17][D][esp_adf_pipeline:193]: State changed from UNAVAILABLE to STOPPED
[15:00:17][I][adf_audio:134]: got new pipeline state: 5
[15:00:17][D][esp_adf_pipeline:049]: Starting request, current state STOPPED
[15:00:17][D][esp_adf_pipeline:193]: State changed from STOPPED to PREPARING
[15:00:17][I][adf_audio:134]: got new pipeline state: 1
[15:00:17][W][component:214]: Component api took a long time for an operation (0.07 s).
[15:00:17][W][component:215]: Components should block for at most 20-30ms.
[15:00:17][D][esp-idf:000]: I (394213) AUDIO_THREAD: The http task allocate stack on external memory
[15:00:17][D][esp-idf:000]: I (394216) AUDIO_ELEMENT: [http-0x3d8203b0] Element task created
[15:00:17][D][esp-idf:000]: I (394219) AUDIO_THREAD: The decoder task allocate stack on external memory
[15:00:17][D][esp-idf:000]: I (394223) AUDIO_ELEMENT: [decoder-0x3d820568] Element task created
[15:00:17][D][esp-idf:000]: I (394225) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[15:00:17][D][esp-idf:000]: I (394229) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[15:00:17][I][esp_aud:000]: I (394232) MP3_DECODER: MP3 Streamer status: 2
[15:00:17][I][esp_aud:000]: I (394232) MP3_DECODER: MP3 Streamer status: 2
[15:00:17][I][esp_audio_sources:066]: decoder status: 2
[15:00:18][D][esp-idf:000]: I (394831) HTTP_CLIENT: Body received in fetch header state, 0x3fcc504b, 1841
[15:00:18][D][esp-idf:000]: I (394834) HTTP_STREAM: total_bytes=11983
[15:00:18][I][HTTPStreamReader:109]: [ * ] Receive music info from mp3 decoder, sample_rates=16000, bits=16, ch=1
[15:00:18][D][esp-idf:000]: W (394874) AUDIO_ELEMENT: IN-[decoder] AEL_IO_ABORT
[15:00:18][D][esp-idf:000]: W (394877) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT
[15:00:18][D][esp-idf:000]: W (394880) MP3_DECODER: output aborted -3
[15:00:18][D][esp-idf:000]: I (394883) MP3_DECODER: Closed
[15:00:18][D][esp-idf:000]: W (394901) HTTP_STREAM: No output due to stopping
[15:00:18][D][esp_adf_pipeline:193]: State changed from PREPARING to STARTING
[15:00:18][I][adf_audio:134]: got new pipeline state: 2
[15:00:18][D][esp-idf:000]: I (394914) AUDIO_ELEMENT: [i2s_out-0x3d820784] Element task created
[15:00:18][D][esp-idf:000]: I (394916) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8391915 Bytes, Inter:162668 Bytes, Dram:162668 Bytes
[15:00:18][D][esp-idf:000]: I (394920) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[15:00:18][D][esp-idf:000]: I (394923) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[15:00:18][D][esp-idf:000]: I (394926) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1
[15:00:18][D][esp-idf:000]: I (394929) I2S_STREAM: AUDIO_STREAM_WRITER
[15:00:18][I][esp_adf_pipeline:114]: [ http ] status: 14
[15:00:18][I][esp_adf_pipeline:114]: [ i2s_out ] status: 12
[15:00:18][I][esp_adf_pipeline:122]: [ * ] CMD: 8 status: 12
[15:00:18][D][esp_adf_pipeline:193]: State changed from STARTING to RUNNING
[15:00:18][I][adf_audio:134]: got new pipeline state: 3
[15:00:19][D][esp-idf:000]: I (395546) HTTP_STREAM: total_bytes=11983
[15:00:19][I][esp_adf_pipeline:114]: [ http ] status: 12
[15:00:19][I][esp_adf_pipeline:114]: [ decoder ] status: 12
[15:00:19][I][HTTPStreamReader:109]: [ * ] Receive music info from mp3 decoder, sample_rates=16000, bits=16, ch=1
[15:00:19][D][esp-idf:000]: W (396028) HTTP_STREAM: No more data,errno:0, total_bytes:11983, rlen = 0
[15:00:19][D][esp-idf:000]: I (396031) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0
[15:00:19][I][esp_adf_pipeline:114]: [ http ] status: 15
[15:00:20][D][esp-idf:000]: I (396538) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2
[15:00:20][D][esp-idf:000]: I (397049) MP3_DECODER: Closed
[15:00:20][I][esp_adf_pipeline:114]: [ decoder ] status: 15
[15:00:20][D][esp-idf:000]: I (397240) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2
[15:00:21][I][esp_adf_pipeline:114]: [ i2s_out ] status: 15
[15:00:21][I][esp_adf_pipeline:122]: [ * ] CMD: 8 status: 15
[15:00:21][D][esp_adf_pipeline:193]: State changed from RUNNING to STOPPED
[15:00:21][I][adf_audio:134]: got new pipeline state: 5
[15:00:21][D][esp_adf_pipeline:286]: Called deinit_all
[15:00:21][D][esp-idf:000]: I (398280) AUDIO_PIPELINE: audio_pipeline_unlinked
[15:00:21][D][esp-idf:000]: W (398283) AUDIO_ELEMENT: [http] Element has not create when AUDIO_ELEMENT_TERMINATE
[15:00:21][D][esp-idf:000]: W (398286) AUDIO_ELEMENT: [decoder] Element has not create when AUDIO_ELEMENT_TERMINATE
[15:00:21][D][esp-idf:000]: W (398289) AUDIO_ELEMENT: [i2s_out] Element has not create when AUDIO_ELEMENT_TERMINATE
[15:00:21][D][esp-idf:000]: I (398292) I2S: DMA queue destroyed
[15:00:21][D][esp_adf_pipeline:193]: State changed from STOPPED to UNAVAILABLE
[15:00:21][I][adf_audio:134]: got new pipeline state: 0
When I try to use the mic, it never recognizes the wake word. I'm not sure how to debug this.
Full duplex mode is now supported. Please see the repo-readme for details on configuration details.
This will be a great improvement to ESPHome!
Also trying to get things working for Onju-Voice board attempting with full duplex mode. Everything compiles fine, and runs without errors. Can stream an mp3 to it from home assistant, but wake word appears to not hear anything.
i2s_audio:
- id: i2s_shared
i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
access_mode: duplex
adf_pipeline:
- platform: i2s_audio
type: audio_out
id: adf_i2s_out
i2s_audio_id: i2s_shared
i2s_dout_pin: GPIO12
adf_alc: false
sample_rate: 16000
bits_per_sample: 32bit
fixed_settings: true
- platform: i2s_audio
type: audio_in
id: adf_i2s_in
i2s_audio_id: i2s_shared
i2s_din_pin: GPIO17
channel: right
pdm: false
bits_per_sample: 16bit
fixed_settings: true
microphone:
- platform: adf_pipeline
id: adf_microphone
keep_pipeline_alive: false
pipeline:
- adf_i2s_in
- resampler
- self
media_player:
- platform: adf_pipeline
id: adf_media_player
name: onju_media_player
internal: false
keep_pipeline_alive: false
pipeline:
- self
- resampler
- adf_i2s_out
micro_wake_word:
model: hey_jarvis
on_wake_word_detected:
- voice_assistant.start:
wake_word: !lambda return wake_word;
voice_assistant:
id: va
microphone: adf_microphone
media_player: adf_media_player
In the original creator's firmware it appears to also use full duplex i2s config, here is an excerpt of their config:
i2s_config_t i2s_config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX | I2S_MODE_RX),
.sample_rate = 16000,
.bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_RIGHT,
.communication_format = I2S_COMM_FORMAT_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 4,
.dma_buf_len = SAMPLE_CHUNK_SIZE}; // mostly set by needs of microphone
i2s_pin_config_t pin_config = {
.bck_io_num = I2S_BCK_PIN,
.ws_io_num = I2S_WS_PIN,
.data_out_num = I2S_OUT,
.data_in_num = I2S_IN};
i2s_driver_install(I2S_NUM, &i2s_config, 0, NULL);
i2s_set_pin(I2S_NUM, &pin_config);
given this, do you think I have setup the esphome_audio config correctly? thanks!
I have the same problem when I try to get this to work with Onju. I'm happy to be a tester as well, if I can help in any way.
I think Onju is using an external DAC and ADC, because here is a snippet of the original config that seems to indicate as much. However, I don't know which DAC and ADC it's using and don't know how to configure the firmware in this project to work with them.
i2s_audio:
- i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
speaker:
- platform: i2s_audio
id: onju_out
dac_type: external
i2s_dout_pin: GPIO12
microphone:
- platform: i2s_audio
id: onju_microphone
i2s_din_pin: GPIO17
adc_type: external
pdm: false
Snippet above is from: https://github.com/tetele/onju-voice-satellite/blob/main/esphome/onju-voice-microwakeword.yaml
Hardware details: https://github.com/justLV/onju-voice
What brought me here is that I want to use the media_player component instead of the speaker component so I can have a voice response (MP3) from Home Assistant play over the Onju as well, which doesn't work with the speaker component.
also tried this since its supported but didnt change anything
- platform: i2s_audio
type: audio_in
id: adf_i2s_in
i2s_audio_id: i2s_shared
i2s_din_pin: GPIO17
channel: right
adc:
model: generic
pdm: false
bits_per_sample: 32bit
fixed_settings: true
Hey, when you want to use i2s in duplex mode, you should use the same settings for in and out. So set both bits_per_sample to 32bit and let the resampler handle the converting. Did you get micro_wake_word running with another setup on your board? Could you share with which config?
Ohh I just saw that @sqldiablo already shared a config, sorry. So besides that you should set the bits_per_sample to 32bit the config looks good to me. You don't need to set the adc, but it shouldn't harm either. Could you share some logs, then?
Ok, tried putting sample rate at 16000 , and bits_per_sample to 32bit for both in and out, but same, nothing from microphone.
here is a working config for this board and micro_wake_word: https://github.com/tetele/onju-voice-satellite/blob/main/esphome/onju-voice-microwakeword.yaml
thanks for looking!
INFO Starting log output from onju2.local using esphome API
INFO Successfully connected to onju2 @ 10.15.4.204 in 6.378s
INFO Successful handshake with onju2 @ 10.15.4.204 in 0.089s
[18:24:16][I][app:100]: ESPHome version 2024.4.0 compiled on Apr 17 2024, 18:23:45
[18:24:16][C][wifi:580]: WiFi:
[18:24:16][C][wifi:408]: Local MAC: 80:65:99:A2:95:6C
[18:24:16][C][wifi:413]: SSID: 'xxxxx'
[18:24:16][C][wifi:416]: IP Address: 10.15.4.204
[18:24:16][C][wifi:420]: BSSID: xxxxxxx
[18:24:16][C][wifi:421]: Hostname: 'onju2'
[18:24:16][C][wifi:423]: Signal strength: -36 dB ▂▄▆█
[18:24:16][C][wifi:427]: Channel: 9
[18:24:16][C][wifi:428]: Subnet: 255.255.240.0
[18:24:16][C][wifi:429]: Gateway: 10.15.0.1
[18:24:16][C][wifi:430]: DNS1: 10.15.0.1
[18:24:16][C][wifi:431]: DNS2: 0.0.0.0
[18:24:16][C][logger:166]: Logger:
[18:24:16][C][logger:167]: Level: DEBUG
[18:24:16][C][logger:169]: Log Baud Rate: 115200
[18:24:16][C][logger:170]: Hardware UART: USB_SERIAL_JTAG
[18:24:16][C][template.number:050]: Template Number 'Touch threshold percentage'
[18:24:16][C][template.number:051]: Optimistic: YES
[18:24:16][C][template.number:052]: Update Interval: never
[18:24:16][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[18:24:16][C][esp32_rmt_led_strip:176]: Pin: 11
[18:24:16][C][esp32_rmt_led_strip:177]: Channel: 0
[18:24:16][C][esp32_rmt_led_strip:202]: RGB Order: GRB
[18:24:16][C][esp32_rmt_led_strip:203]: Max refresh rate: 0
[18:24:16][C][esp32_rmt_led_strip:204]: Number of LEDs: 6
[18:24:16][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[18:24:16][C][switch.gpio:091]: Restore Mode: always OFF
[18:24:16][C][switch.gpio:031]: Pin: GPIO21
[18:24:16][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[18:24:16][C][gpio.binary_sensor:016]: Pin: GPIO38
[18:24:16][C][light:103]: Light 'leds'
[18:24:16][C][light:105]: Default Transition Length: 0.0s
[18:24:16][C][light:106]: Gamma Correct: 2.80
[18:24:16][C][light:103]: Light 'left_led'
[18:24:16][C][light:105]: Default Transition Length: 0.1s
[18:24:16][C][light:106]: Gamma Correct: 2.80
[18:24:16][C][light:103]: Light 'top_led'
[18:24:16][C][light:105]: Default Transition Length: 0.1s
[18:24:16][C][light:106]: Gamma Correct: 2.80
[18:24:16][C][light:103]: Light 'right_led'
[18:24:16][C][light:105]: Default Transition Length: 0.1s
[18:24:16][C][light:106]: Gamma Correct: 2.80
[18:24:16][C][template.switch:068]: Template Switch 'Use Wake Word'
[18:24:16][C][template.switch:091]: Restore Mode: restore defaults to ON
[18:24:16][C][template.switch:057]: Optimistic: YES
[18:24:16][C][psram:020]: PSRAM:
[18:24:16][C][psram:021]: Available: YES
[18:24:16][C][psram:024]: Size: 8191 KB
[18:24:16][C][i2s_audio:028]: I2SController:
[18:24:16][C][i2s_audio:029]: AccessMode: duplex
[18:24:16][C][i2s_audio:030]: Port: 0
[18:24:16][C][i2s_audio:032]: Reader registered.
[18:24:16][C][i2s_audio:035]: Writer registered.
[18:24:16][C][i2s_audio:138]: I2S-Writer (Fixed-CFG):
[18:24:16][C][i2s_audio:140]: sample-rate: 16000 bits_per_sample: 32
[18:24:16][C][i2s_audio:141]: channel_fmt: 0 channels: 2
[18:24:16][C][i2s_audio:142]: use_apll: no, use_pdm: no
[18:24:16][C][i2s_audio:135]: I2S-Reader (Fixed-CFG):
[18:24:16][C][i2s_audio:140]: sample-rate: 16000 bits_per_sample: 32
[18:24:16][C][i2s_audio:141]: channel_fmt: 3 channels: 1
[18:24:16][C][i2s_audio:142]: use_apll: no, use_pdm: no
[18:24:16][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[18:24:16][C][esp32_touch:074]: Meas cycle: 0.80ms
[18:24:16][C][esp32_touch:075]: Sleep cycle: 2.00ms
[18:24:16][C][esp32_touch:095]: Low Voltage Reference: 0.8V
[18:24:16][C][esp32_touch:115]: High Voltage Reference: 2.4V
[18:24:16][C][esp32_touch:135]: Voltage Attenuation: 0V
[18:24:16][C][esp32_touch:169]: Filter mode: IIR_16
[18:24:16][C][esp32_touch:170]: Debounce count: 2
[18:24:16][C][esp32_touch:171]: Noise threshold coefficient: 0
[18:24:16][C][esp32_touch:172]: Jitter filter step size: 0
[18:24:16][C][esp32_touch:191]: Smooth level: IIR_2
[18:24:16][C][esp32_touch:213]: Denoise grade: BIT8
[18:24:16][C][esp32_touch:245]: Denoise capacitance level: L0
[18:24:16][C][esp32_touch:260]: Touch Pad 'volume_down'
[18:24:16][C][esp32_touch:261]: Pad: T4
[18:24:16][C][esp32_touch:262]: Threshold: 485736
[18:24:16][C][esp32_touch:260]: Touch Pad 'volume_up'
[18:24:16][C][esp32_touch:261]: Pad: T2
[18:24:16][C][esp32_touch:262]: Threshold: 526604
[18:24:16][C][esp32_touch:260]: Touch Pad 'action'
[18:24:16][C][esp32_touch:261]: Pad: T3
[18:24:16][C][esp32_touch:262]: Threshold: 682733
[18:24:16][C][captive_portal:088]: Captive Portal:
[18:24:16][C][mdns:115]: mDNS:
[18:24:16][C][mdns:116]: Hostname: onju2
[18:24:16][C][ota:096]: Over-The-Air Updates:
[18:24:16][C][ota:097]: Address: onju2.local:3232
[18:24:16][C][ota:100]: Using Password.
[18:24:16][C][ota:103]: OTA version: 2.
[18:24:16][C][api:139]: API Server:
[18:24:16][C][api:140]: Address: onju2.local:6053
[18:24:16][C][api:142]: Using noise encryption: YES
[18:24:16][C][improv_serial:032]: Improv Serial:
[18:24:16][C][micro_wake_word:057]: microWakeWord:
[18:24:16][C][micro_wake_word:058]: Wake Word: hey jarvis
[18:24:16][C][micro_wake_word:059]: Probability cutoff: 0.500
[18:24:16][C][micro_wake_word:060]: Sliding window size: 10
[18:24:16][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[18:24:16][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[18:24:16][C][adf_media_player:018]: Number of ASPComponents: 3
[18:24:17][D][light:036]: 'top_led' Setting:
[18:24:17][D][light:051]: Brightness: 60%
[18:24:17][D][light:059]: Red: 100%, Green: 0%, Blue: 100%
[18:24:17][D][light:109]: Effect: 'listening_ww'
[18:24:38][D][media_player:059]: 'onju_media_player' - Setting
[18:24:38][D][media_player:066]: Media URL: http://10.19.15.100:8123/media/local/Auntie's%20Lock.mp3?authSig=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJmNzI3ZGIxODJhZTI0YzczOGU1MjE4MGQ0MzYzYTI2YSIsInBhdGgiOiIvbWVkaWEvbG9jYWwvQXVudGllJ3MgTG9jay5tcDMiLCJwYXJhbXMiOltdLCJpYXQiOjE3MTMzOTk4NzgsImV4cCI6MTcxMzQ4NjI3OH0.jz9w-qWqJ0hsdKi33U2I9deESgDqtrgk0RcBSur7CM8
[18:24:38][D][adf_media_player:030]: Got control call in state 1
[18:24:38][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[18:24:38][D][esp-idf:000]: I (29523) MP3_DECODER: MP3 init
[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 0, http
[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 2, resampler
[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 3, i2s_out
[18:24:38][D][esp-idf:000]: I (29536) AUDIO_PIPELINE: link el->rb, el:0x3d832f54, tag:http, rb:0x3d8336b0
[18:24:38][D][esp-idf:000]: I (29539) AUDIO_PIPELINE: link el->rb, el:0x3d833214, tag:decoder, rb:0x3d8346f0
[18:24:38][D][esp-idf:000]: I (29542) AUDIO_PIPELINE: link el->rb, el:0x3d8333b0, tag:resampler, rb:0x3d835730
[18:24:38][D][esp_adf_pipeline:370]: Setting up event listener.
[18:24:38][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[18:24:38][I][adf_media_player:135]: got new pipeline state: 1
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 16000, ch: 2 DST: rate: 16000, ch: 2
[18:24:38][D][esp-idf:000]: I (29574) AUDIO_THREAD: The http task allocate stack on external memory
[18:24:38][D][esp-idf:000]: I (29577) AUDIO_ELEMENT: [http-0x3d832f54] Element task created
[18:24:38][D][esp-idf:000]: I (29579) AUDIO_THREAD: The decoder task allocate stack on external memory
[18:24:38][D][esp-idf:000]: I (29583) AUDIO_ELEMENT: [decoder-0x3d833214] Element task created
[18:24:38][D][esp-idf:000]: I (29586) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[18:24:38][D][esp-idf:000]: I (29589) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[18:24:38][D][esp_audio_sources:097]: Streamer status: 2
[18:24:38][D][esp_audio_sources:098]: decoder status: 2
[18:24:38][D][esp-idf:000]: I (29620) HTTP_CLIENT: Body received in fetch header state, 0x3fcc3c0a, 1738
[18:24:38][D][esp-idf:000]: I (29625) HTTP_STREAM: total_bytes=6564597
[18:24:38][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2
[18:24:38][D][esp_audio_processors:088]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2
[18:24:38][D][esp-idf:000]: W (29717) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT
[18:24:38][D][esp-idf:000]: W (29721) MP3_DECODER: output aborted -3
[18:24:38][D][esp-idf:000]: I (29725) MP3_DECODER: Closed
[18:24:38][D][esp-idf:000]: W (29731) AUDIO_ELEMENT: OUT-[http] AEL_IO_ABORT
[18:24:38][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:24:38][I][adf_media_player:135]: got new pipeline state: 2
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2
[18:24:38][D][esp-idf:000]: I (29758) AUDIO_THREAD: The resampler task allocate stack on external memory
[18:24:38][D][esp-idf:000]: I (29760) AUDIO_ELEMENT: [resampler-0x3d8333b0] Element task created
[18:24:38][D][esp-idf:000]: I (29763) AUDIO_ELEMENT: [i2s_out-0x3d833568] Element task created
[18:24:38][D][esp-idf:000]: I (29766) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8311211 Bytes, Inter:161316 Bytes, Dram:161316 Bytes
[18:24:38][D][esp-idf:000]: I (29770) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[18:24:38][D][esp-idf:000]: I (29773) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[18:24:38][D][esp-idf:000]: I (29777) AUDIO_ELEMENT: [resampler] AEL_MSG_CMD_RESUME,state:1
[18:24:38][D][esp-idf:000]: I (29781) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1
[18:24:38][D][esp-idf:000]: I (29784) I2S_STREAM: AUDIO_STREAM_WRITER
[18:24:38][I][esp_adf_pipeline:214]: [ decoder ] status: 14
[18:24:38][I][esp_adf_pipeline:214]: [ http ] status: 14
[18:24:38][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 2
[18:24:38][D][esp-idf:000]: I (29981) RSP_FILTER: sample rate of source data : 44100, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 2
[18:24:38][I][esp_adf_pipeline:214]: [ resampler ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 2
[18:24:38][D][esp-idf:000]: I (30009) HTTP_CLIENT: Body received in fetch header state, 0x3fcc2ffa, 1738
[18:24:38][D][esp-idf:000]: I (30016) HTTP_STREAM: total_bytes=6564597
[18:24:38][I][esp_adf_pipeline:214]: [ http ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [decoder] status, 2
[18:24:38][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[18:24:38][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:24:38][I][adf_media_player:135]: got new pipeline state: 3
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2
[18:24:38][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2
[18:24:41][D][media_player:059]: 'onju_media_player' - Setting
[18:24:41][D][media_player:063]: Command: STOP
[18:24:41][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:24:41][I][adf_media_player:135]: got new pipeline state: 4
[18:24:41][D][esp-idf:000]: W (33088) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT
[18:24:41][D][esp-idf:000]: W (33091) MP3_DECODER: output aborted -3
[18:24:41][D][esp-idf:000]: I (33095) MP3_DECODER: Closed
[18:24:41][D][esp-idf:000]: W (33101) HTTP_STREAM: No output due to stopping
[18:24:41][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:24:41][I][adf_media_player:135]: got new pipeline state: 5
[18:24:51][D][switch:016]: 'Use Wake Word' Turning OFF.
[18:24:51][D][switch:055]: 'Use Wake Word': Sending state OFF
[18:24:51][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[18:24:51][D][light:036]: 'top_led' Setting:
[18:24:51][D][light:047]: State: OFF
[18:24:51][D][light:085]: Transition length: 0.1s
[18:24:51][D][light:091]: Effect: 'None'
[18:24:51][D][micro_wake_word:134]: Stopping Microphone
[18:24:51][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:24:51][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:24:51][D][esp-idf:000]: W (42766) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:24:51][D][esp-idf:000]: W (42769) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:24:51][D][esp-idf:000]: W (42772) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:24:51][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:24:51][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[18:24:56][D][switch:012]: 'Use Wake Word' Turning ON.
[18:24:56][D][switch:055]: 'Use Wake Word': Sending state ON
[18:24:56][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[18:24:56][D][light:036]: 'top_led' Setting:
[18:24:56][D][light:047]: State: ON
[18:24:56][D][light:051]: Brightness: 60%
[18:24:56][D][light:059]: Red: 100%, Green: 0%, Blue: 100%
[18:24:56][D][light:109]: Effect: 'listening_ww'
[18:24:56][D][micro_wake_word:115]: Starting Microphone
[18:24:56][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:24:56][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:24:56][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:24:56][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:24:56][D][esp-idf:000]: I (47984) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8363219 Bytes, Inter:146700 Bytes, Dram:146700 Bytes
[18:24:56][D][esp-idf:000]: I (47987) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1
[18:24:56][D][esp-idf:000]: I (47990) AUDIO_ELEMENT: [resampler] AEL_MSG_CMD_RESUME,state:1
[18:24:56][D][esp-idf:000]: I (47992) AUDIO_PIPELINE: Pipeline started
[18:24:56][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[18:24:56][I][esp_adf_pipeline:214]: [ resampler ] status: 14
[18:24:56][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[18:24:56][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[18:24:56][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[18:24:56][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[18:24:56][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[18:24:56][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:24:56][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[18:24:56][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[18:24:56][I][esp_adf_pipeline:214]: [ resampler ] status: 12
steps:
you could also try gaining up the volume a bit, there is an experimental undocumented option gain_log2 ;) default should be set to 2
microphone:
- platform: adf_pipeline
id: adf_microphone
gain_log2: 3
keep_pipeline_alive: false
pipeline:
- adf_i2s_in
- self
as I am thinking about it I have to check how it is working with the resampler, maybe also try without the resampler for the microphone, if you set it to 16kHz and 32bit it should be obsolete anyway.
Progress!
Without the re-sampler the following happens:
Here is a log of 1 and 2:
[18:36:58][D][micro_wake_word:362]: Wake word sliding average probability is 0.536 and most recent probability is 1.000
[18:36:58][D][micro_wake_word:128]: Wake Word Detected
[18:36:58][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[18:36:58][D][micro_wake_word:134]: Stopping Microphone
[18:36:58][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:36:58][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:36:58][D][esp-idf:000]: W (75869) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:36:58][D][esp-idf:000]: W (75872) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:36:58][D][esp-idf:000]: W (75875) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:36:58][D][esp-idf:000]: W (75879) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:36:58][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:36:58][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[18:36:58][D][voice_assistant:439]: State changed from IDLE to START_PIPELINE
[18:36:58][D][voice_assistant:445]: Desired state set to START_MICROPHONE
[18:36:58][D][voice_assistant:126]: microphone not running
[18:36:58][D][voice_assistant:210]: Requesting start...
[18:36:58][D][voice_assistant:439]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:36:58][D][voice_assistant:126]: microphone not running
[18:36:58][D][voice_assistant:476]: Client started, streaming microphone
[18:36:58][D][voice_assistant:439]: State changed from STARTING_PIPELINE to START_MICROPHONE
[18:36:58][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[18:36:58][D][voice_assistant:163]: Starting Microphone
[18:36:58][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:36:58][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:36:58][D][voice_assistant:439]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:36:58][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:36:58][D][esp-idf:000]: I (75950) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8386231 Bytes, Inter:156256 Bytes, Dram:156256 Bytes
[18:36:58][D][esp-idf:000]: I (75954) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1
[18:36:58][D][esp-idf:000]: I (75957) AUDIO_PIPELINE: Pipeline started
[18:36:58][D][voice_assistant:563]: Event Type: 1
[18:36:58][D][voice_assistant:566]: Assist Pipeline running
[18:36:58][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[18:36:58][D][voice_assistant:563]: Event Type: 3
[18:36:58][D][voice_assistant:577]: STT started
[18:36:58][D][light:036]: 'top_led' Setting:
[18:36:58][D][light:051]: Brightness: 100%
[18:36:58][D][light:059]: Red: 100%, Green: 100%, Blue: 100%
[18:36:58][D][light:109]: Effect: 'listening'
[18:36:58][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[18:36:58][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[18:36:58][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[18:36:58][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[18:36:58][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:36:58][D][voice_assistant:439]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[18:36:58][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[18:36:59][D][voice_assistant:563]: Event Type: 11
[18:36:59][D][voice_assistant:717]: Starting STT by VAD
[18:37:00][D][voice_assistant:563]: Event Type: 12
[18:37:00][D][voice_assistant:721]: STT by VAD end
[18:37:00][D][voice_assistant:439]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[18:37:00][D][voice_assistant:445]: Desired state set to AWAITING_RESPONSE
[18:37:00][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:37:00][D][voice_assistant:439]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:37:00][D][light:036]: 'top_led' Setting:
[18:37:00][D][light:051]: Brightness: 70%
[18:37:00][D][light:059]: Red: 0%, Green: 20%, Blue: 100%
[18:37:00][D][light:109]: Effect: 'processing'
[18:37:00][D][esp-idf:000]: W (77294) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:37:00][D][esp-idf:000]: W (77297) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:37:00][D][esp-idf:000]: W (77300) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:37:00][D][esp-idf:000]: W (77304) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT
[18:37:00][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:37:00][D][voice_assistant:439]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[18:37:00][D][voice_assistant:563]: Event Type: 4
[18:37:00][D][voice_assistant:591]: Speech recognised as: " What time is it?"
[18:37:00][D][voice_assistant:563]: Event Type: 5
[18:37:00][D][voice_assistant:596]: Intent started
[18:37:00][D][voice_assistant:563]: Event Type: 6
[18:37:00][D][voice_assistant:563]: Event Type: 7
[18:37:00][D][voice_assistant:619]: Response: "The current time is 18:37 Mountain Time on Wednesday, April 17, 2024."
[18:37:00][D][voice_assistant:563]: Event Type: 8
[18:37:00][D][voice_assistant:639]: Response URL: "http://10.19.15.100:8123/api/tts_proxy/491349b64163dcff9ed5d41242b6d89e5713fa8f_en-gb_010745e5ef_tts.piper.mp3"
[18:37:00][D][voice_assistant:439]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[18:37:00][D][voice_assistant:445]: Desired state set to STREAMING_RESPONSE
[18:37:00][D][media_player:059]: 'onju_media_player' - Setting
[18:37:00][D][media_player:066]: Media URL: http://10.19.15.100:8123/api/tts_proxy/491349b64163dcff9ed5d41242b6d89e5713fa8f_en-gb_010745e5ef_tts.piper.mp3
[18:37:00][D][adf_media_player:030]: Got control call in state 1
[18:37:00][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:37:00][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:37:00][I][adf_media_player:135]: got new pipeline state: 1
[18:37:00][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:37:00][D][light:036]: 'top_led' Setting:
[18:37:00][D][light:059]: Red: 20%, Green: 100%, Blue: 0%
[18:37:00][D][light:109]: Effect: 'speaking'
[18:37:00][D][voice_assistant:563]: Event Type: 2
[18:37:00][D][voice_assistant:653]: Assist Pipeline ended
[18:37:00][D][esp-idf:000]: I (78129) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[18:37:00][D][esp-idf:000]: I (78132) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[18:37:00][D][esp_audio_sources:097]: Streamer status: 2
[18:37:00][D][esp_audio_sources:098]: decoder status: 2
[18:37:00][D][light:036]: 'top_led' Setting:
[18:37:00][D][light:051]: Brightness: 60%
[18:37:00][D][light:059]: Red: 100%, Green: 0%, Blue: 100%
[18:37:01][D][light:109]: Effect: 'listening_ww'
[18:37:01][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[18:37:01][D][micro_wake_word:115]: Starting Microphone
[18:37:01][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:37:01][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:37:01][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:37:01][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:37:01][D][esp-idf:000]: I (78475) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8385715 Bytes, Inter:157852 Bytes, Dram:157852 Bytes
[18:37:01][D][esp-idf:000]: I (78477) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1
[18:37:01][D][esp-idf:000]: I (78481) AUDIO_PIPELINE: Pipeline started
[18:37:01][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[18:37:01][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[18:37:01][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[18:37:01][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[18:37:01][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[18:37:01][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:37:01][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[18:37:01][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[18:37:01][D][esp-idf:000]: I (78794) HTTP_CLIENT: Body received in fetch header state, 0x3fcc35bb, 1841
[18:37:01][D][esp-idf:000]: I (78801) HTTP_STREAM: total_bytes=60399
[18:37:01][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=16000, bits=16, ch=1
[18:37:01][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:37:01][D][esp-idf:000]: W (78874) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT
[18:37:01][D][esp-idf:000]: W (78879) MP3_DECODER: output aborted -3
[18:37:01][D][esp-idf:000]: I (78883) MP3_DECODER: Closed
[18:37:01][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:37:01][I][adf_media_player:135]: got new pipeline state: 2
[18:37:01][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:37:01][D][esp-idf:000]: I (78908) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8393243 Bytes, Inter:163268 Bytes, Dram:163268 Bytes
[18:37:01][D][esp-idf:000]: I (78911) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[18:37:01][D][esp-idf:000]: I (78914) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[18:37:01][D][esp-idf:000]: I (78917) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1
Log:
[18:38:57][D][media_player:059]: 'onju_media_player' - Setting
[18:38:57][D][media_player:066]: Media URL: http://10.19.15.100:8123/media/local/Auntie's%20Lock.mp3?authSig=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJmNzI3ZGIxODJhZTI0YzczOGU1MjE4MGQ0MzYzYTI2YSIsInBhdGgiOiIvbWVkaWEvbG9jYWwvQXVudGllJ3MgTG9jay5tcDMiLCJwYXJhbXMiOltdLCJpYXQiOjE3MTM0MDA3MzcsImV4cCI6MTcxMzQ4NzEzN30.4LuxjFgeQPIE9S50Im58at3Yg7aMm2NEcXk7fd-hCNY
[18:38:57][D][adf_media_player:030]: Got control call in state 1
[18:38:57][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:38:57][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:38:57][I][adf_media_player:135]: got new pipeline state: 1
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:57][D][esp-idf:000]: I (194756) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[18:38:57][D][esp-idf:000]: I (194759) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[18:38:57][D][esp_audio_sources:097]: Streamer status: 2
[18:38:57][D][esp_audio_sources:098]: decoder status: 2
[18:38:57][D][esp-idf:000]: I (194790) HTTP_CLIENT: Body received in fetch header state, 0x3fcc2a22, 1738
[18:38:57][D][esp-idf:000]: I (194795) HTTP_STREAM: total_bytes=6564597
[18:38:57][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:57][D][esp-idf:000]: W (194926) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT
[18:38:57][D][esp-idf:000]: W (194929) MP3_DECODER: output aborted -3
[18:38:57][D][esp-idf:000]: I (194933) MP3_DECODER: Closed
[18:38:57][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:38:57][I][adf_media_player:135]: got new pipeline state: 2
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:57][D][esp-idf:000]: I (194956) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8387635 Bytes, Inter:157868 Bytes, Dram:157868 Bytes
[18:38:57][D][esp-idf:000]: I (194959) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1
[18:38:57][D][esp-idf:000]: I (194961) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1
[18:38:57][D][esp-idf:000]: I (194964) MP3_DECODER: MP3 opened
[18:38:57][D][esp-idf:000]: I (195001) HTTP_CLIENT: Body received in fetch header state, 0x3fcc7446, 1738
[18:38:57][D][esp-idf:000]: I (195006) HTTP_STREAM: total_bytes=6564597
[18:38:57][I][esp_adf_pipeline:214]: [ decoder ] status: 14
[18:38:57][I][esp_adf_pipeline:214]: [ http ] status: 14
[18:38:57][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[18:38:57][D][esp_adf_pipeline:131]: Check element [http] status, 3
[18:38:57][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[18:38:57][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[18:38:57][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:38:57][I][adf_media_player:135]: got new pipeline state: 3
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:58][I][esp_adf_pipeline:214]: [ http ] status: 12
[18:38:58][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[18:38:58][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:38:58][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:39:02][D][media_player:059]: 'onju_media_player' - Setting
[18:39:02][D][media_player:063]: Command: STOP
[18:39:02][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:39:02][I][adf_media_player:135]: got new pipeline state: 4
[18:39:03][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:39:03][I][adf_media_player:135]: got new pipeline state: 5
Feels like perhaps the re-sampler is indeed the missing ingredient for the output, maybe ill try to put it just on that?
yes, sorry that is what I meant, for the output you definitely need it! But good to hear that you got success with the microphone!
Woohoo! It works!
I will make a PR in the onju-voice repo with the full config after I test it a bit more.
Previously with this device w/ micro wake word there was no "media_player" component, so this is a huge upgrade - amazing work!!
One caveat, which I imagine other devices must have dealt with is now that you can talk and play music at the same time haha, how does micro wake word know when to stop listening? It actually did detect me correctly even with music playing thru it, but the timeout seemed very long. Anyways, another optimization for another day! :)
This is awesome to hear. @cowboyrushforth , do you mind sharing your final config?
this is what I have so far, still playing with some things though
That got me working! Thanks for your help, @gnumpi & @cowboyrushforth!
as I am thinking about it I have to check how it is working with the resampler, maybe also try without the resampler for the microphone, if you set it to 16kHz and 32bit it should be obsolete anyway.
Hi - i've been following this thread closely :)
Would you like a separate issue for the resampler and mic not working together? I'd really like to set the sample_rate to 48000 for better quality output, but then of course the mic doesn't work.
Thanks for your hard work!
Right now the pipeline is working properly only when different I2s controllers are used for input and output. Add support for sharing a i2s controller by: