gnumpi / esphome_audio

Custom audio components for ESPHome
Other
19 stars 7 forks source link

Add support for i2s duplex mode #17

Open gnumpi opened 2 months ago

gnumpi commented 2 months ago

Right now the pipeline is working properly only when different I2s controllers are used for input and output. Add support for sharing a i2s controller by:

gnumpi commented 2 months ago

I committed a version which makes sure to release the i2s controller after stopping the pipeline. This should enable support for DACs and ADCs which share lrclk and bclk pins. When the media player is started outside the wake_word detection or voice assistant pipeline, it is not aware of a running voice assistant loop right now.
So please make sure to stop the voice assistant before trying to stream media like radio stations.

For testing please use the following branch:

- source:
     type: git
     url: https://github.com/gnumpi/esphome_audio
     ref: 17-add-support-for-i2s-duplex-mode
   components: [ adf_pipeline, i2s_audio ]
   refresh: 0s
sqldiablo commented 2 months ago

When I run this version, I can't get the mic or media player to work. Below is my config file for ESP Home:

substitutions:
  name: "living-room-onju-home"
  friendly_name: "Living Room Onju Home"

external_components:
  - source:
      type: git
      url: https://github.com/gnumpi/esphome_audio
      ref: 17-add-support-for-i2s-duplex-mode
    components: [ adf_pipeline, i2s_audio ]

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  min_version: 2024.2.0
  platformio_options:
    build_flags: "-DBOARD_HAS_PSRAM"
    board_build.arduino.memory_type: qio_opi
    board_build.flash_mode: dio
  on_boot:
    then:
      - light.turn_on:
          id: top_led
          effect: slow_pulse
          red: 100%
          green: 60%
          blue: 0%
      - wait_until:
          condition:
            wifi.connected:
      - light.turn_on:
          id: top_led
          effect: pulse
          red: 0%
          green: 100%
          blue: 0%
      - wait_until:
          condition:
            api.connected:
      - light.turn_on:
          id: top_led
          effect: none
          red: 0%
          green: 100%
          blue: 0%
      - delay: 1s
      - script.execute: reset_led

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      # need to set a s3 compatible board for the adf-sdk to compile
      # board specific code is not used though
      CONFIG_ESP32_S3_BOX_BOARD: "y"

psram:
  mode: octal
  speed: 80MHz

logger:
api:
  encryption:
    key: "<REDACTED>"
  services:
    - service: start_va
      then:
        - voice_assistant.start
    - service: stop_va
      then:
        - voice_assistant.stop

ota:

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  ap:
    password: "${wifi_ap_password}"

globals:
  - id: thresh_percent
    type: float
    initial_value: "0.03"
    restore_value: false
  - id: touch_calibration_values_left
    type: uint32_t[5]
    restore_value: false
  - id: touch_calibration_values_center
    type: uint32_t[5]
    restore_value: false
  - id: touch_calibration_values_right
    type: uint32_t[5]
    restore_value: false

interval:
  - interval: 1s
    then:
      - script.execute:
          id: calibrate_touch
          button: 0
      - script.execute:
          id: calibrate_touch
          button: 1
      - script.execute:
          id: calibrate_touch
          button: 2

#i2s_audio:
#  - i2s_lrclk_pin: GPIO13
#    i2s_bclk_pin: GPIO18

#speaker:
#  - platform: i2s_audio
#    id: onju_out
#    dac_type: external
#    i2s_dout_pin: GPIO12
#    mode: stereo

#microphone:
#  - platform: i2s_audio
#    id: onju_microphone
#    i2s_din_pin: GPIO17
#    adc_type: external
#    pdm: false

i2s_audio:
  - id: i2s_all_audio
    i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18

adf_pipeline:
  - platform: i2s_audio
    type: sink
    id: adf_i2s_out
    i2s_audio_id: i2s_all_audio
    i2s_dout_pin: GPIO12

  - platform: i2s_audio
    type: source
    id: adf_i2s_in
    i2s_audio_id: i2s_all_audio
    i2s_din_pin: GPIO17
    channel: right
    sample_rate: 16000
    bits_per_sample: 16bit

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    pipeline:
      - adf_i2s_in
      - self

media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: None
    internal: false
    pipeline:
      - self
      - adf_i2s_out

micro_wake_word:
  model: hey_jarvis
  on_wake_word_detected:
    then:
      - voice_assistant.start

voice_assistant:
  id: va
  microphone: adf_microphone
  media_player: adf_media_player
  use_wake_word: false
  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 8.0
  on_listening:
    - light.turn_on:
        id: top_led
        blue: 100%
        red: 100%
        green: 100%
        brightness: 100%
        effect: listening
  on_stt_vad_end:
    - light.turn_on:
        id: top_led
        blue: 100%
        red: 0%
        green: 20%
        brightness: 70%
        effect: processing
  on_tts_end:
    - light.turn_on:
        id: top_led
        blue: 0%
        red: 20%
        green: 100%
        effect: speaking
  on_end:
    - delay: 500ms
    - wait_until:
        not:
          media_player.is_playing: adf_media_player
    - script.execute: reset_led
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - delay: 200ms
          - micro_wake_word.start
  on_client_connected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - micro_wake_word.start:
  on_client_disconnected:
    - if:
        condition:
          and:
            - switch.is_on: use_wake_word
            - binary_sensor.is_off: mute_switch
        then:
          - voice_assistant.stop:
          - micro_wake_word.stop:
  on_error:
    - light.turn_on:
        id: top_led
        blue: 0%
        red: 100%
        green: 0%
        effect: none
    - delay: 1s
    - script.execute: reset_led

number:
  - platform: template
    name: "Touch threshold percentage"
    id: touch_threshold_percentage
    update_interval: never
    entity_category: config
    initial_value: 1.25
    min_value: -1
    max_value: 5
    step: 0.25
    optimistic: true
    on_value:
      then:
        - lambda: !lambda |-
            id(thresh_percent) = 0.01 * x;

esp32_touch:
  setup_mode: false
  sleep_duration: 2ms
  measurement_duration: 800us
  low_voltage_reference: 0.8V
  high_voltage_reference: 2.4V

  filter_mode: IIR_16
  debounce_count: 2
  noise_threshold: 0
  jitter_step: 0
  smooth_mode: IIR_2

  denoise_grade: BIT8
  denoise_cap_level: L0

binary_sensor:
  - platform: esp32_touch
    id: volume_down
    pin: GPIO4
    threshold: 539000 # 533156-551132
    on_press: 
      then:
        - light.turn_on: left_led
        - script.execute:
            id: set_volume
            volume: -0.05
        - delay: 1s
        - while:
            condition:
              binary_sensor.is_on: volume_down
            then:
              - script.execute:
                  id: set_volume
                  volume: -0.05
              - delay: 150ms
    on_release: 
      then:
        - light.turn_off: left_led

  - platform: esp32_touch
    id: volume_up
    pin: GPIO2
    threshold: 580000 # 575735-593064
    on_press: 
      then:
        - light.turn_on: right_led
        - script.execute:
            id: set_volume
            volume: 0.05
        - delay: 1s
        - while:
            condition:
              binary_sensor.is_on: volume_up
            then:
              - script.execute:
                  id: set_volume
                  volume: 0.05
              - delay: 150ms
    on_release: 
      then:
        - light.turn_off: right_led

  - platform: esp32_touch
    id: action
    pin: GPIO3
    threshold: 751000 # 745618-767100
    on_click:
      - if:
          condition:
            or:
              - switch.is_off: use_wake_word
              - binary_sensor.is_on: mute_switch
          then:
            - logger.log:
                tag: "action_click"
                format: "Voice assistant is running: %s"
                args: ['id(va).is_running() ? "yes" : "no"']
            - if:
                condition: media_player.is_playing
                then:
                  - media_player.stop
            - if:
                condition: voice_assistant.is_running
                then:
                  - voice_assistant.stop:
                else:
                  - voice_assistant.start:
          else:
            - logger.log:
                tag: "action_click"
                format: "Voice assistant was running with wake word detection enabled. Starting continuously"
            - if:
                condition: media_player.is_playing
                then:
                  - media_player.stop
            - voice_assistant.stop
            - delay: 1s
            - script.execute: reset_led
            - script.wait: reset_led
            - voice_assistant.start_continuous:

  - platform: gpio
    id: mute_switch
    pin:
      number: GPIO38
      mode: INPUT_PULLUP
    name: Disable wake word
    on_press:
      - script.execute: turn_off_wake_word
    on_release:
      - script.execute: turn_on_wake_word

light:
  - platform: esp32_rmt_led_strip
    id: leds
    pin: GPIO11
    chipset: SK6812
    num_leds: 6
    rgb_order: grb
    rmt_channel: 0
    default_transition_length: 0s
    gamma_correct: 2.8
  - platform: partition
    id: left_led
    segments:
      - id: leds
        from: 0
        to: 0
    default_transition_length: 100ms
  - platform: partition
    id: top_led
    segments:
      - id: leds
        from: 1
        to: 4
    default_transition_length: 100ms
    effects:
      - pulse:
          name: pulse
          transition_length: 250ms
          update_interval: 250ms
      - pulse:
          name: slow_pulse
          transition_length: 1s
          update_interval: 2s
      - addressable_twinkle:
          name: listening_ww
          twinkle_probability: 1%
      - addressable_twinkle:
          name: listening
          twinkle_probability: 45%
      - addressable_scan:
          name: processing
          move_interval: 80ms
      - addressable_flicker:
          name: speaking
          intensity: 35%
  - platform: partition
    id: right_led
    segments:
      - id: leds
        from: 5
        to: 5
    default_transition_length: 100ms

script:
  - id: reset_led
    then:
      - if:
          condition:
            and:
              - switch.is_on: use_wake_word
              - binary_sensor.is_off: mute_switch
          then:
            - light.turn_on:
                id: top_led
                blue: 100%
                red: 100%
                green: 0%
                brightness: 60%
                effect: listening_ww
          else:
            - light.turn_off: top_led

  - id: set_volume
    mode: restart
    parameters:
      volume: float
    then:
      - light.turn_on:
          id: top_led
          effect: show_volume
      - delay: 1s
      - script.execute: reset_led

  - id: turn_on_wake_word
    then:
      - if:
          condition:
            and:
              - binary_sensor.is_off: mute_switch
              - switch.is_on: use_wake_word
          then:
            - micro_wake_word.start
            - if:
                condition:
                  media_player.is_playing:
                then:
                  - media_player.stop:
            - script.execute: reset_led
          else:
            - logger.log:
                tag: "turn_on_wake_word"
                format: "Trying to start listening for wake word, but %s"
                args:
                  [
                    'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
                  ]
                level: "INFO"

  - id: turn_off_wake_word
    then:
      - micro_wake_word.stop
      - script.execute: reset_led

  - id: calibrate_touch
    parameters:
      button: int
    then:
      - lambda: |-
          static uint8_t thresh_indices[3] = {0, 0, 0};
          static uint32_t sums[3] = {0, 0, 0};
          static uint8_t qsizes[3] = {0, 0, 0};
          static uint16_t consecutive_anomalies_per_button[3] = {0, 0, 0};

          uint32_t newval;
          uint32_t* calibration_values;
          switch(button) {
            case 0:
              newval = id(volume_down).get_value();
              calibration_values = id(touch_calibration_values_left);
              break;
            case 1:
              newval = id(action).get_value();
              calibration_values = id(touch_calibration_values_center);
              break;
            case 2:
              newval = id(volume_up).get_value();
              calibration_values = id(touch_calibration_values_right);
              break;
            default:
              ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
              return;
          }

          if(newval == 0) return;

          //ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
          //ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);

          if(qsizes[button] == 5) {
            float avg = float(sums[button])/float(qsizes[button]);
            if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
              consecutive_anomalies_per_button[button]++;
              //ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
              if(consecutive_anomalies_per_button[button] < 10)
                return;
            } 
          }

          //ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
          consecutive_anomalies_per_button[button] = 0;

          if(qsizes[button] == 5) {
            //ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
            sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
            qsizes[button]--;
          }
          *(calibration_values+thresh_indices[button]) = newval;
          sums[button] += newval;
          qsizes[button]++;
          thresh_indices[button] = (thresh_indices[button] + 1) % 5;

          //ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
          uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
          //ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);

          switch(button) {
            case 0:
              id(volume_down).set_threshold(newthresh);
              break;
            case 1:
              id(action).set_threshold(newthresh);
              break;
            case 2:
              id(volume_up).set_threshold(newthresh);
              break;
            default:
              ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
              return;
          }

switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - script.execute: turn_on_wake_word
    on_turn_off:
      - script.execute: turn_off_wake_word
  - platform: gpio
    id: dac_mute
    restore_mode: ALWAYS_ON
    pin:
      number: GPIO21
      inverted: True

Here is the log when I try to play audio:

INFO ESPHome 2024.2.1
INFO Reading configuration /config/esphome/living-room-onju-home.yaml...
INFO Starting log output from <REDACTED> using esphome API
INFO Successfully connected to living-room-onju-home @ <REDACTED> in 0.021s
INFO Successful handshake with living-room-onju-home @ <REDACTED> in 0.085s
[15:00:12][I][app:102]: ESPHome version 2024.2.1 compiled on Mar  5 2024, 14:52:53
[15:00:12][C][wifi:577]: WiFi:
[15:00:12][C][wifi:409]:   Local MAC: <REDACTED>
[15:00:12][C][wifi:414]:   SSID: 'HASS'[redacted]
[15:00:12][C][wifi:415]:   IP Address: <REDACTED>
[15:00:12][C][wifi:417]:   BSSID: [redacted]
[15:00:12][C][wifi:418]:   Hostname: 'living-room-onju-home'
[15:00:12][C][wifi:420]:   Signal strength: -43 dB ▂▄▆█
[15:00:12][C][wifi:424]:   Channel: 1
[15:00:12][C][wifi:425]:   Subnet: <REDACTED>
[15:00:12][C][wifi:426]:   Gateway: <REDACTED>
[15:00:12][C][wifi:427]:   DNS1: <REDACTED>
[15:00:12][C][wifi:428]:   DNS2: <REDACTED>
[15:00:12][C][logger:447]: Logger:
[15:00:12][C][logger:448]:   Level: DEBUG
[15:00:12][C][logger:449]:   Log Baud Rate: 115200
[15:00:12][C][logger:451]:   Hardware UART: USB_SERIAL_JTAG
[15:00:12][C][template.number:050]: Template Number 'Touch threshold percentage'
[15:00:12][C][template.number:051]:   Optimistic: YES
[15:00:12][C][template.number:052]:   Update Interval: never
[15:00:12][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[15:00:12][C][esp32_rmt_led_strip:176]:   Pin: 11
[15:00:12][C][esp32_rmt_led_strip:177]:   Channel: 0
[15:00:12][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[15:00:12][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[15:00:12][C][esp32_rmt_led_strip:204]:   Number of LEDs: 6
[15:00:12][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[15:00:12][C][switch.gpio:091]:   Restore Mode: always ON
[15:00:12][C][switch.gpio:031]:   Pin: GPIO21
[15:00:12][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[15:00:12][C][gpio.binary_sensor:016]:   Pin: GPIO38
[15:00:12][C][light:103]: Light 'leds'
[15:00:12][C][light:105]:   Default Transition Length: 0.0s
[15:00:12][C][light:106]:   Gamma Correct: 2.80
[15:00:12][C][light:103]: Light 'left_led'
[15:00:12][C][light:105]:   Default Transition Length: 0.1s
[15:00:12][C][light:106]:   Gamma Correct: 2.80
[15:00:12][C][light:103]: Light 'top_led'
[15:00:12][C][light:105]:   Default Transition Length: 0.1s
[15:00:12][C][light:106]:   Gamma Correct: 2.80
[15:00:12][C][light:103]: Light 'right_led'
[15:00:12][C][light:105]:   Default Transition Length: 0.1s
[15:00:12][C][light:106]:   Gamma Correct: 2.80
[15:00:12][C][template.switch:068]: Template Switch 'Use Wake Word'
[15:00:12][C][template.switch:091]:   Restore Mode: restore defaults to ON
[15:00:12][C][template.switch:057]:   Optimistic: YES
[15:00:12][C][psram:020]: PSRAM:
[15:00:12][C][psram:021]:   Available: YES
[15:00:12][C][psram:024]:   Size: 8191 KB
[15:00:12][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[15:00:12][C][esp32_touch:074]:   Meas cycle: 0.80ms
[15:00:12][C][esp32_touch:075]:   Sleep cycle: 2.00ms
[15:00:12][C][esp32_touch:095]:   Low Voltage Reference: 0.8V
[15:00:12][C][esp32_touch:115]:   High Voltage Reference: 2.4V
[15:00:12][C][esp32_touch:135]:   Voltage Attenuation: 0V
[15:00:12][C][esp32_touch:169]:   Filter mode: IIR_16
[15:00:12][C][esp32_touch:170]:   Debounce count: 2
[15:00:12][C][esp32_touch:171]:   Noise threshold coefficient: 0
[15:00:12][C][esp32_touch:172]:   Jitter filter step size: 0
[15:00:12][C][esp32_touch:191]:   Smooth level: IIR_2
[15:00:12][C][esp32_touch:213]:   Denoise grade: BIT8
[15:00:12][C][esp32_touch:245]:   Denoise capacitance level: L0
[15:00:12][C][esp32_touch:260]:   Touch Pad 'volume_down'
[15:00:12][C][esp32_touch:261]:     Pad: T4
[15:00:12][C][esp32_touch:262]:     Threshold: 582598
[15:00:12][C][esp32_touch:260]:   Touch Pad 'volume_up'
[15:00:12][C][esp32_touch:261]:     Pad: T2
[15:00:12][C][esp32_touch:262]:     Threshold: 586502
[15:00:12][C][esp32_touch:260]:   Touch Pad 'action'
[15:00:12][C][esp32_touch:261]:     Pad: T3
[15:00:12][C][esp32_touch:262]:     Threshold: 775188
[15:00:12][C][mdns:115]: mDNS:
[15:00:12][C][mdns:116]:   Hostname: living-room-onju-home
[15:00:13][C][ota:096]: Over-The-Air Updates:
[15:00:13][C][ota:097]:   Address: living-room-onju-home.local:3232
[15:00:13][C][ota:103]:   OTA version: 2.
[15:00:13][C][api:139]: API Server:
[15:00:13][C][api:140]:   Address: living-room-onju-home.local:6053
[15:00:13][C][api:142]:   Using noise encryption: YES
[15:00:13][C][micro_wake_word:057]: microWakeWord:
[15:00:13][C][micro_wake_word:058]:   Wake Word: hey jarvis
[15:00:13][C][micro_wake_word:059]:   Probability cutoff: 0.500
[15:00:13][C][micro_wake_word:060]:   Sliding window size: 10
[15:00:13][C][adf_audio:016]: ESP-ADF-MediaPlayer:
[15:00:13][C][adf_audio:018]:   Number of ASPComponents: 2
[15:00:17][D][media_player:059]: 'Living Room Onju Home' - Setting
[15:00:17][D][media_player:066]:   Media URL: https://<REDACTED>/api/tts_proxy/a54d88e06612d820bc3be72877c74f257b561b19_en-gb_8cd8d30e6e_tts.piper.mp3
[15:00:17][D][esp_adf_pipeline:038]: Init request, current state UNAVAILABLE
[15:00:17][D][esp-idf:000]: I (394148) MP3_DECODER: MP3 init

[15:00:17][D][esp_adf_pipeline:233]: Adding new component
[15:00:17][D][esp_adf_pipeline:235]: Adding element of component
[15:00:17][D][esp_adf_pipeline:235]: Adding element of component
[15:00:17][D][esp-idf:000]: I (394160) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8

[15:00:17][D][esp-idf:000]: I (394163) I2S: I2S0, MCLK output by GPIO2

[15:00:17][D][esp-idf:000]: I (394165) ESP32_S3_BOX: I2S0, MCLK output by GPIO0

[15:00:17][D][esp_adf_pipeline:233]: Adding new component
[15:00:17][D][esp_adf_pipeline:235]: Adding element of component
[15:00:17][D][esp_adf_pipeline:249]: pipeline tag 0, http
[15:00:17][D][esp_adf_pipeline:249]: pipeline tag 1, decoder
[15:00:17][D][esp_adf_pipeline:249]: pipeline tag 2, i2s_out
[15:00:17][D][esp-idf:000]: I (394179) AUDIO_PIPELINE: link el->rb, el:0x3d8203b0, tag:http, rb:0x3d8208cc

[15:00:17][D][esp-idf:000]: I (394183) AUDIO_PIPELINE: link el->rb, el:0x3d820568, tag:decoder, rb:0x3d82190c

[15:00:17][D][esp_adf_pipeline:262]: Setting up event listener.
[15:00:17][D][esp_adf_pipeline:193]: State changed from UNAVAILABLE to STOPPED
[15:00:17][I][adf_audio:134]: got new pipeline state: 5
[15:00:17][D][esp_adf_pipeline:049]: Starting request, current state STOPPED
[15:00:17][D][esp_adf_pipeline:193]: State changed from STOPPED to PREPARING
[15:00:17][I][adf_audio:134]: got new pipeline state: 1
[15:00:17][W][component:214]: Component api took a long time for an operation (0.07 s).
[15:00:17][W][component:215]: Components should block for at most 20-30ms.
[15:00:17][D][esp-idf:000]: I (394213) AUDIO_THREAD: The http task allocate stack on external memory

[15:00:17][D][esp-idf:000]: I (394216) AUDIO_ELEMENT: [http-0x3d8203b0] Element task created

[15:00:17][D][esp-idf:000]: I (394219) AUDIO_THREAD: The decoder task allocate stack on external memory

[15:00:17][D][esp-idf:000]: I (394223) AUDIO_ELEMENT: [decoder-0x3d820568] Element task created

[15:00:17][D][esp-idf:000]: I (394225) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[15:00:17][D][esp-idf:000]: I (394229) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[15:00:17][I][esp_aud:000]: I (394232) MP3_DECODER: MP3 Streamer status: 2
[15:00:17][I][esp_aud:000]: I (394232) MP3_DECODER: MP3 Streamer status: 2
[15:00:17][I][esp_audio_sources:066]: decoder status: 2
[15:00:18][D][esp-idf:000]: I (394831) HTTP_CLIENT: Body received in fetch header state, 0x3fcc504b, 1841

[15:00:18][D][esp-idf:000]: I (394834) HTTP_STREAM: total_bytes=11983

[15:00:18][I][HTTPStreamReader:109]: [ * ] Receive music info from mp3 decoder, sample_rates=16000, bits=16, ch=1
[15:00:18][D][esp-idf:000]: W (394874) AUDIO_ELEMENT: IN-[decoder] AEL_IO_ABORT

[15:00:18][D][esp-idf:000]: W (394877) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[15:00:18][D][esp-idf:000]: W (394880) MP3_DECODER: output aborted -3

[15:00:18][D][esp-idf:000]: I (394883) MP3_DECODER: Closed

[15:00:18][D][esp-idf:000]: W (394901) HTTP_STREAM: No output due to stopping

[15:00:18][D][esp_adf_pipeline:193]: State changed from PREPARING to STARTING
[15:00:18][I][adf_audio:134]: got new pipeline state: 2
[15:00:18][D][esp-idf:000]: I (394914) AUDIO_ELEMENT: [i2s_out-0x3d820784] Element task created

[15:00:18][D][esp-idf:000]: I (394916) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8391915 Bytes, Inter:162668 Bytes, Dram:162668 Bytes

[15:00:18][D][esp-idf:000]: I (394920) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[15:00:18][D][esp-idf:000]: I (394923) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[15:00:18][D][esp-idf:000]: I (394926) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1

[15:00:18][D][esp-idf:000]: I (394929) I2S_STREAM: AUDIO_STREAM_WRITER

[15:00:18][I][esp_adf_pipeline:114]: [ http ] status: 14
[15:00:18][I][esp_adf_pipeline:114]: [ i2s_out ] status: 12
[15:00:18][I][esp_adf_pipeline:122]: [ * ] CMD: 8  status: 12
[15:00:18][D][esp_adf_pipeline:193]: State changed from STARTING to RUNNING
[15:00:18][I][adf_audio:134]: got new pipeline state: 3
[15:00:19][D][esp-idf:000]: I (395546) HTTP_STREAM: total_bytes=11983

[15:00:19][I][esp_adf_pipeline:114]: [ http ] status: 12
[15:00:19][I][esp_adf_pipeline:114]: [ decoder ] status: 12
[15:00:19][I][HTTPStreamReader:109]: [ * ] Receive music info from mp3 decoder, sample_rates=16000, bits=16, ch=1
[15:00:19][D][esp-idf:000]: W (396028) HTTP_STREAM: No more data,errno:0, total_bytes:11983, rlen = 0

[15:00:19][D][esp-idf:000]: I (396031) AUDIO_ELEMENT: IN-[http] AEL_IO_DONE,0

[15:00:19][I][esp_adf_pipeline:114]: [ http ] status: 15
[15:00:20][D][esp-idf:000]: I (396538) AUDIO_ELEMENT: IN-[decoder] AEL_IO_DONE,-2

[15:00:20][D][esp-idf:000]: I (397049) MP3_DECODER: Closed

[15:00:20][I][esp_adf_pipeline:114]: [ decoder ] status: 15
[15:00:20][D][esp-idf:000]: I (397240) AUDIO_ELEMENT: IN-[i2s_out] AEL_IO_DONE,-2

[15:00:21][I][esp_adf_pipeline:114]: [ i2s_out ] status: 15
[15:00:21][I][esp_adf_pipeline:122]: [ * ] CMD: 8  status: 15
[15:00:21][D][esp_adf_pipeline:193]: State changed from RUNNING to STOPPED
[15:00:21][I][adf_audio:134]: got new pipeline state: 5
[15:00:21][D][esp_adf_pipeline:286]: Called deinit_all
[15:00:21][D][esp-idf:000]: I (398280) AUDIO_PIPELINE: audio_pipeline_unlinked

[15:00:21][D][esp-idf:000]: W (398283) AUDIO_ELEMENT: [http] Element has not create when AUDIO_ELEMENT_TERMINATE

[15:00:21][D][esp-idf:000]: W (398286) AUDIO_ELEMENT: [decoder] Element has not create when AUDIO_ELEMENT_TERMINATE

[15:00:21][D][esp-idf:000]: W (398289) AUDIO_ELEMENT: [i2s_out] Element has not create when AUDIO_ELEMENT_TERMINATE

[15:00:21][D][esp-idf:000]: I (398292) I2S: DMA queue destroyed

[15:00:21][D][esp_adf_pipeline:193]: State changed from STOPPED to UNAVAILABLE
[15:00:21][I][adf_audio:134]: got new pipeline state: 0

When I try to use the mic, it never recognizes the wake word. I'm not sure how to debug this.

gnumpi commented 1 month ago

Full duplex mode is now supported. Please see the repo-readme for details on configuration details.

cowboyrushforth commented 1 month ago

This will be a great improvement to ESPHome!

Also trying to get things working for Onju-Voice board attempting with full duplex mode. Everything compiles fine, and runs without errors. Can stream an mp3 to it from home assistant, but wake word appears to not hear anything.

i2s_audio:
  - id: i2s_shared
    i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18
    access_mode: duplex

adf_pipeline:
  - platform: i2s_audio
    type: audio_out
    id: adf_i2s_out
    i2s_audio_id: i2s_shared
    i2s_dout_pin: GPIO12
    adf_alc: false
    sample_rate: 16000
    bits_per_sample: 32bit
    fixed_settings: true

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_shared
    i2s_din_pin: GPIO17
    channel: right
    pdm: false
    bits_per_sample: 16bit
    fixed_settings: true

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    keep_pipeline_alive: false
    pipeline:
      - adf_i2s_in
      - resampler
      - self

media_player:
  - platform: adf_pipeline
    id: adf_media_player
    name: onju_media_player
    internal: false
    keep_pipeline_alive: false
    pipeline:
      - self
      - resampler
      - adf_i2s_out

micro_wake_word:
  model: hey_jarvis
  on_wake_word_detected:
    - voice_assistant.start:
        wake_word: !lambda return wake_word;

voice_assistant:
  id: va
  microphone: adf_microphone
  media_player: adf_media_player

In the original creator's firmware it appears to also use full duplex i2s config, here is an excerpt of their config:

i2s_config_t i2s_config = {
    .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX | I2S_MODE_RX),
    .sample_rate = 16000,
    .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,
    .channel_format = I2S_CHANNEL_FMT_ONLY_RIGHT,
    .communication_format = I2S_COMM_FORMAT_I2S,
    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
    .dma_buf_count = 4,
    .dma_buf_len = SAMPLE_CHUNK_SIZE}; // mostly set by needs of microphone

i2s_pin_config_t pin_config = {
    .bck_io_num = I2S_BCK_PIN,
    .ws_io_num = I2S_WS_PIN,
    .data_out_num = I2S_OUT,
    .data_in_num = I2S_IN};

  i2s_driver_install(I2S_NUM, &i2s_config, 0, NULL);
  i2s_set_pin(I2S_NUM, &pin_config);

given this, do you think I have setup the esphome_audio config correctly? thanks!

sqldiablo commented 1 month ago

I have the same problem when I try to get this to work with Onju. I'm happy to be a tester as well, if I can help in any way.

I think Onju is using an external DAC and ADC, because here is a snippet of the original config that seems to indicate as much. However, I don't know which DAC and ADC it's using and don't know how to configure the firmware in this project to work with them.

i2s_audio:
  - i2s_lrclk_pin: GPIO13
    i2s_bclk_pin: GPIO18

speaker:
  - platform: i2s_audio
    id: onju_out
    dac_type: external
    i2s_dout_pin: GPIO12

microphone:
  - platform: i2s_audio
    id: onju_microphone
    i2s_din_pin: GPIO17
    adc_type: external
    pdm: false

Snippet above is from: https://github.com/tetele/onju-voice-satellite/blob/main/esphome/onju-voice-microwakeword.yaml

Hardware details: https://github.com/justLV/onju-voice

What brought me here is that I want to use the media_player component instead of the speaker component so I can have a voice response (MP3) from Home Assistant play over the Onju as well, which doesn't work with the speaker component.

cowboyrushforth commented 1 month ago

also tried this since its supported but didnt change anything

  - platform: i2s_audio
    type: audio_in
    id: adf_i2s_in
    i2s_audio_id: i2s_shared
    i2s_din_pin: GPIO17
    channel: right
    adc:
      model: generic
    pdm: false
    bits_per_sample: 32bit
    fixed_settings: true
gnumpi commented 1 month ago

Hey, when you want to use i2s in duplex mode, you should use the same settings for in and out. So set both bits_per_sample to 32bit and let the resampler handle the converting. Did you get micro_wake_word running with another setup on your board? Could you share with which config?

gnumpi commented 1 month ago

Ohh I just saw that @sqldiablo already shared a config, sorry. So besides that you should set the bits_per_sample to 32bit the config looks good to me. You don't need to set the adc, but it shouldn't harm either. Could you share some logs, then?

cowboyrushforth commented 1 month ago

Ok, tried putting sample rate at 16000 , and bits_per_sample to 32bit for both in and out, but same, nothing from microphone.

here is a working config for this board and micro_wake_word: https://github.com/tetele/onju-voice-satellite/blob/main/esphome/onju-voice-microwakeword.yaml

thanks for looking!

cowboyrushforth commented 1 month ago
INFO Starting log output from onju2.local using esphome API
INFO Successfully connected to onju2 @ 10.15.4.204 in 6.378s
INFO Successful handshake with onju2 @ 10.15.4.204 in 0.089s
[18:24:16][I][app:100]: ESPHome version 2024.4.0 compiled on Apr 17 2024, 18:23:45
[18:24:16][C][wifi:580]: WiFi:
[18:24:16][C][wifi:408]:   Local MAC: 80:65:99:A2:95:6C
[18:24:16][C][wifi:413]:   SSID: 'xxxxx'
[18:24:16][C][wifi:416]:   IP Address: 10.15.4.204
[18:24:16][C][wifi:420]:   BSSID: xxxxxxx
[18:24:16][C][wifi:421]:   Hostname: 'onju2'
[18:24:16][C][wifi:423]:   Signal strength: -36 dB ▂▄▆█
[18:24:16][C][wifi:427]:   Channel: 9
[18:24:16][C][wifi:428]:   Subnet: 255.255.240.0
[18:24:16][C][wifi:429]:   Gateway: 10.15.0.1
[18:24:16][C][wifi:430]:   DNS1: 10.15.0.1
[18:24:16][C][wifi:431]:   DNS2: 0.0.0.0
[18:24:16][C][logger:166]: Logger:
[18:24:16][C][logger:167]:   Level: DEBUG
[18:24:16][C][logger:169]:   Log Baud Rate: 115200
[18:24:16][C][logger:170]:   Hardware UART: USB_SERIAL_JTAG
[18:24:16][C][template.number:050]: Template Number 'Touch threshold percentage'
[18:24:16][C][template.number:051]:   Optimistic: YES
[18:24:16][C][template.number:052]:   Update Interval: never
[18:24:16][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[18:24:16][C][esp32_rmt_led_strip:176]:   Pin: 11
[18:24:16][C][esp32_rmt_led_strip:177]:   Channel: 0
[18:24:16][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[18:24:16][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[18:24:16][C][esp32_rmt_led_strip:204]:   Number of LEDs: 6
[18:24:16][C][switch.gpio:068]: GPIO Switch 'dac_mute'
[18:24:16][C][switch.gpio:091]:   Restore Mode: always OFF
[18:24:16][C][switch.gpio:031]:   Pin: GPIO21
[18:24:16][C][gpio.binary_sensor:015]: GPIO Binary Sensor 'Disable wake word'
[18:24:16][C][gpio.binary_sensor:016]:   Pin: GPIO38
[18:24:16][C][light:103]: Light 'leds'
[18:24:16][C][light:105]:   Default Transition Length: 0.0s
[18:24:16][C][light:106]:   Gamma Correct: 2.80
[18:24:16][C][light:103]: Light 'left_led'
[18:24:16][C][light:105]:   Default Transition Length: 0.1s
[18:24:16][C][light:106]:   Gamma Correct: 2.80
[18:24:16][C][light:103]: Light 'top_led'
[18:24:16][C][light:105]:   Default Transition Length: 0.1s
[18:24:16][C][light:106]:   Gamma Correct: 2.80
[18:24:16][C][light:103]: Light 'right_led'
[18:24:16][C][light:105]:   Default Transition Length: 0.1s
[18:24:16][C][light:106]:   Gamma Correct: 2.80
[18:24:16][C][template.switch:068]: Template Switch 'Use Wake Word'
[18:24:16][C][template.switch:091]:   Restore Mode: restore defaults to ON
[18:24:16][C][template.switch:057]:   Optimistic: YES
[18:24:16][C][psram:020]: PSRAM:
[18:24:16][C][psram:021]:   Available: YES
[18:24:16][C][psram:024]:   Size: 8191 KB
[18:24:16][C][i2s_audio:028]: I2SController:
[18:24:16][C][i2s_audio:029]:   AccessMode: duplex
[18:24:16][C][i2s_audio:030]:   Port: 0
[18:24:16][C][i2s_audio:032]:   Reader registered.
[18:24:16][C][i2s_audio:035]:   Writer registered.
[18:24:16][C][i2s_audio:138]: I2S-Writer (Fixed-CFG):
[18:24:16][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[18:24:16][C][i2s_audio:141]:   channel_fmt: 0 channels: 2
[18:24:16][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[18:24:16][C][i2s_audio:135]: I2S-Reader (Fixed-CFG):
[18:24:16][C][i2s_audio:140]:   sample-rate: 16000 bits_per_sample: 32
[18:24:16][C][i2s_audio:141]:   channel_fmt: 3 channels: 1
[18:24:16][C][i2s_audio:142]:   use_apll: no, use_pdm: no
[18:24:16][C][esp32_touch:073]: Config for ESP32 Touch Hub:
[18:24:16][C][esp32_touch:074]:   Meas cycle: 0.80ms
[18:24:16][C][esp32_touch:075]:   Sleep cycle: 2.00ms
[18:24:16][C][esp32_touch:095]:   Low Voltage Reference: 0.8V
[18:24:16][C][esp32_touch:115]:   High Voltage Reference: 2.4V
[18:24:16][C][esp32_touch:135]:   Voltage Attenuation: 0V
[18:24:16][C][esp32_touch:169]:   Filter mode: IIR_16
[18:24:16][C][esp32_touch:170]:   Debounce count: 2
[18:24:16][C][esp32_touch:171]:   Noise threshold coefficient: 0
[18:24:16][C][esp32_touch:172]:   Jitter filter step size: 0
[18:24:16][C][esp32_touch:191]:   Smooth level: IIR_2
[18:24:16][C][esp32_touch:213]:   Denoise grade: BIT8
[18:24:16][C][esp32_touch:245]:   Denoise capacitance level: L0
[18:24:16][C][esp32_touch:260]:   Touch Pad 'volume_down'
[18:24:16][C][esp32_touch:261]:     Pad: T4
[18:24:16][C][esp32_touch:262]:     Threshold: 485736
[18:24:16][C][esp32_touch:260]:   Touch Pad 'volume_up'
[18:24:16][C][esp32_touch:261]:     Pad: T2
[18:24:16][C][esp32_touch:262]:     Threshold: 526604
[18:24:16][C][esp32_touch:260]:   Touch Pad 'action'
[18:24:16][C][esp32_touch:261]:     Pad: T3
[18:24:16][C][esp32_touch:262]:     Threshold: 682733
[18:24:16][C][captive_portal:088]: Captive Portal:
[18:24:16][C][mdns:115]: mDNS:
[18:24:16][C][mdns:116]:   Hostname: onju2
[18:24:16][C][ota:096]: Over-The-Air Updates:
[18:24:16][C][ota:097]:   Address: onju2.local:3232
[18:24:16][C][ota:100]:   Using Password.
[18:24:16][C][ota:103]:   OTA version: 2.
[18:24:16][C][api:139]: API Server:
[18:24:16][C][api:140]:   Address: onju2.local:6053
[18:24:16][C][api:142]:   Using noise encryption: YES
[18:24:16][C][improv_serial:032]: Improv Serial:
[18:24:16][C][micro_wake_word:057]: microWakeWord:
[18:24:16][C][micro_wake_word:058]:   Wake Word: hey jarvis
[18:24:16][C][micro_wake_word:059]:   Probability cutoff: 0.500
[18:24:16][C][micro_wake_word:060]:   Sliding window size: 10
[18:24:16][C][esp_adf_pipeline.microphone:020]: ADF-Microphone
[18:24:16][C][adf_media_player:016]: ESP-ADF-MediaPlayer:
[18:24:16][C][adf_media_player:018]:   Number of ASPComponents: 3
[18:24:17][D][light:036]: 'top_led' Setting:
[18:24:17][D][light:051]:   Brightness: 60%
[18:24:17][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[18:24:17][D][light:109]:   Effect: 'listening_ww'
[18:24:38][D][media_player:059]: 'onju_media_player' - Setting
[18:24:38][D][media_player:066]:   Media URL: http://10.19.15.100:8123/media/local/Auntie's%20Lock.mp3?authSig=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJmNzI3ZGIxODJhZTI0YzczOGU1MjE4MGQ0MzYzYTI2YSIsInBhdGgiOiIvbWVkaWEvbG9jYWwvQXVudGllJ3MgTG9jay5tcDMiLCJwYXJhbXMiOltdLCJpYXQiOjE3MTMzOTk4NzgsImV4cCI6MTcxMzQ4NjI3OH0.jz9w-qWqJ0hsdKi33U2I9deESgDqtrgk0RcBSur7CM8
[18:24:38][D][adf_media_player:030]: Got control call in state 1
[18:24:38][D][esp_adf_pipeline:050]: Starting request, current state UNINITIALIZED
[18:24:38][D][esp-idf:000]: I (29523) MP3_DECODER: MP3 init

[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 0, http
[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 1, decoder
[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 2, resampler
[18:24:38][D][esp_adf_pipeline:358]: pipeline tag 3, i2s_out
[18:24:38][D][esp-idf:000]: I (29536) AUDIO_PIPELINE: link el->rb, el:0x3d832f54, tag:http, rb:0x3d8336b0

[18:24:38][D][esp-idf:000]: I (29539) AUDIO_PIPELINE: link el->rb, el:0x3d833214, tag:decoder, rb:0x3d8346f0

[18:24:38][D][esp-idf:000]: I (29542) AUDIO_PIPELINE: link el->rb, el:0x3d8333b0, tag:resampler, rb:0x3d835730

[18:24:38][D][esp_adf_pipeline:370]: Setting up event listener.
[18:24:38][D][esp_adf_pipeline:302]: State changed from UNINITIALIZED to PREPARING
[18:24:38][I][adf_media_player:135]: got new pipeline state: 1
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 16000, ch: 2 DST: rate: 16000, ch: 2 
[18:24:38][D][esp-idf:000]: I (29574) AUDIO_THREAD: The http task allocate stack on external memory

[18:24:38][D][esp-idf:000]: I (29577) AUDIO_ELEMENT: [http-0x3d832f54] Element task created

[18:24:38][D][esp-idf:000]: I (29579) AUDIO_THREAD: The decoder task allocate stack on external memory

[18:24:38][D][esp-idf:000]: I (29583) AUDIO_ELEMENT: [decoder-0x3d833214] Element task created

[18:24:38][D][esp-idf:000]: I (29586) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[18:24:38][D][esp-idf:000]: I (29589) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:24:38][D][esp_audio_sources:097]: Streamer status: 2
[18:24:38][D][esp_audio_sources:098]: decoder status: 2
[18:24:38][D][esp-idf:000]: I (29620) HTTP_CLIENT: Body received in fetch header state, 0x3fcc3c0a, 1738

[18:24:38][D][esp-idf:000]: I (29625) HTTP_STREAM: total_bytes=6564597

[18:24:38][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2 
[18:24:38][D][esp_audio_processors:088]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2 
[18:24:38][D][esp-idf:000]: W (29717) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[18:24:38][D][esp-idf:000]: W (29721) MP3_DECODER: output aborted -3

[18:24:38][D][esp-idf:000]: I (29725) MP3_DECODER: Closed

[18:24:38][D][esp-idf:000]: W (29731) AUDIO_ELEMENT: OUT-[http] AEL_IO_ABORT

[18:24:38][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:24:38][I][adf_media_player:135]: got new pipeline state: 2
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2 
[18:24:38][D][esp-idf:000]: I (29758) AUDIO_THREAD: The resampler task allocate stack on external memory

[18:24:38][D][esp-idf:000]: I (29760) AUDIO_ELEMENT: [resampler-0x3d8333b0] Element task created

[18:24:38][D][esp-idf:000]: I (29763) AUDIO_ELEMENT: [i2s_out-0x3d833568] Element task created

[18:24:38][D][esp-idf:000]: I (29766) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8311211 Bytes, Inter:161316 Bytes, Dram:161316 Bytes

[18:24:38][D][esp-idf:000]: I (29770) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[18:24:38][D][esp-idf:000]: I (29773) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:24:38][D][esp-idf:000]: I (29777) AUDIO_ELEMENT: [resampler] AEL_MSG_CMD_RESUME,state:1

[18:24:38][D][esp-idf:000]: I (29781) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1

[18:24:38][D][esp-idf:000]: I (29784) I2S_STREAM: AUDIO_STREAM_WRITER

[18:24:38][I][esp_adf_pipeline:214]: [ decoder ] status: 14
[18:24:38][I][esp_adf_pipeline:214]: [ http ] status: 14
[18:24:38][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 2
[18:24:38][D][esp-idf:000]: I (29981) RSP_FILTER: sample rate of source data : 44100, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 2

[18:24:38][I][esp_adf_pipeline:214]: [ resampler ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 2
[18:24:38][D][esp-idf:000]: I (30009) HTTP_CLIENT: Body received in fetch header state, 0x3fcc2ffa, 1738

[18:24:38][D][esp-idf:000]: I (30016) HTTP_STREAM: total_bytes=6564597

[18:24:38][I][esp_adf_pipeline:214]: [ http ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [decoder] status, 2
[18:24:38][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[18:24:38][D][esp_adf_pipeline:131]: Check element [http] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[18:24:38][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[18:24:38][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:24:38][I][adf_media_player:135]: got new pipeline state: 3
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2 
[18:24:38][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:24:38][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:24:38][D][esp_audio_processors:079]: New settings: SRC: rate: 44100, ch: 2 DST: rate: 16000, ch: 2 
[18:24:41][D][media_player:059]: 'onju_media_player' - Setting
[18:24:41][D][media_player:063]:   Command: STOP
[18:24:41][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:24:41][I][adf_media_player:135]: got new pipeline state: 4
[18:24:41][D][esp-idf:000]: W (33088) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[18:24:41][D][esp-idf:000]: W (33091) MP3_DECODER: output aborted -3

[18:24:41][D][esp-idf:000]: I (33095) MP3_DECODER: Closed

[18:24:41][D][esp-idf:000]: W (33101) HTTP_STREAM: No output due to stopping

[18:24:41][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:24:41][I][adf_media_player:135]: got new pipeline state: 5
[18:24:51][D][switch:016]: 'Use Wake Word' Turning OFF.
[18:24:51][D][switch:055]: 'Use Wake Word': Sending state OFF
[18:24:51][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[18:24:51][D][light:036]: 'top_led' Setting:
[18:24:51][D][light:047]:   State: OFF
[18:24:51][D][light:085]:   Transition length: 0.1s
[18:24:51][D][light:091]:   Effect: 'None'
[18:24:51][D][micro_wake_word:134]: Stopping Microphone
[18:24:51][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:24:51][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:24:51][D][esp-idf:000]: W (42766) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:24:51][D][esp-idf:000]: W (42769) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:24:51][D][esp-idf:000]: W (42772) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:24:51][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:24:51][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[18:24:56][D][switch:012]: 'Use Wake Word' Turning ON.
[18:24:56][D][switch:055]: 'Use Wake Word': Sending state ON
[18:24:56][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[18:24:56][D][light:036]: 'top_led' Setting:
[18:24:56][D][light:047]:   State: ON
[18:24:56][D][light:051]:   Brightness: 60%
[18:24:56][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[18:24:56][D][light:109]:   Effect: 'listening_ww'
[18:24:56][D][micro_wake_word:115]: Starting Microphone
[18:24:56][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:24:56][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:24:56][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:24:56][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:24:56][D][esp-idf:000]: I (47984) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8363219 Bytes, Inter:146700 Bytes, Dram:146700 Bytes

[18:24:56][D][esp-idf:000]: I (47987) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[18:24:56][D][esp-idf:000]: I (47990) AUDIO_ELEMENT: [resampler] AEL_MSG_CMD_RESUME,state:1

[18:24:56][D][esp-idf:000]: I (47992) AUDIO_PIPELINE: Pipeline started

[18:24:56][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[18:24:56][I][esp_adf_pipeline:214]: [ resampler ] status: 14
[18:24:56][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[18:24:56][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[18:24:56][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[18:24:56][D][esp_adf_pipeline:131]: Check element [resampler] status, 3
[18:24:56][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[18:24:56][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:24:56][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[18:24:56][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[18:24:56][I][esp_adf_pipeline:214]: [ resampler ] status: 12

steps:

  1. ota and start logging
  2. went to home assistant and cicked on an MP3 - it did play normally
  3. disabled wake word in home asssistant
  4. enabled it
  5. spoke wake word, multiple times, softly and loudly
gnumpi commented 1 month ago

you could also try gaining up the volume a bit, there is an experimental undocumented option gain_log2 ;) default should be set to 2

microphone:
  - platform: adf_pipeline
    id: adf_microphone
    gain_log2: 3
    keep_pipeline_alive: false
    pipeline:
      - adf_i2s_in
      - self

as I am thinking about it I have to check how it is working with the resampler, maybe also try without the resampler for the microphone, if you set it to 16kHz and 32bit it should be obsolete anyway.

cowboyrushforth commented 1 month ago

Progress!

Without the re-sampler the following happens:

  1. It seems to detect my voice perfectly. (the gain_log2: 3 is also still present, for what its worth)
  2. However the TTS playback, also works, which is great, but it sounds like a chipmunk hahaha.

Here is a log of 1 and 2:

[18:36:58][D][micro_wake_word:362]: Wake word sliding average probability is 0.536 and most recent probability is 1.000
[18:36:58][D][micro_wake_word:128]: Wake Word Detected
[18:36:58][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[18:36:58][D][micro_wake_word:134]: Stopping Microphone
[18:36:58][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:36:58][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:36:58][D][esp-idf:000]: W (75869) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:36:58][D][esp-idf:000]: W (75872) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:36:58][D][esp-idf:000]: W (75875) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:36:58][D][esp-idf:000]: W (75879) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:36:58][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:36:58][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[18:36:58][D][voice_assistant:439]: State changed from IDLE to START_PIPELINE
[18:36:58][D][voice_assistant:445]: Desired state set to START_MICROPHONE
[18:36:58][D][voice_assistant:126]: microphone not running
[18:36:58][D][voice_assistant:210]: Requesting start...
[18:36:58][D][voice_assistant:439]: State changed from START_PIPELINE to STARTING_PIPELINE
[18:36:58][D][voice_assistant:126]: microphone not running
[18:36:58][D][voice_assistant:476]: Client started, streaming microphone
[18:36:58][D][voice_assistant:439]: State changed from STARTING_PIPELINE to START_MICROPHONE
[18:36:58][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[18:36:58][D][voice_assistant:163]: Starting Microphone
[18:36:58][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:36:58][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:36:58][D][voice_assistant:439]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:36:58][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:36:58][D][esp-idf:000]: I (75950) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8386231 Bytes, Inter:156256 Bytes, Dram:156256 Bytes

[18:36:58][D][esp-idf:000]: I (75954) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[18:36:58][D][esp-idf:000]: I (75957) AUDIO_PIPELINE: Pipeline started

[18:36:58][D][voice_assistant:563]: Event Type: 1
[18:36:58][D][voice_assistant:566]: Assist Pipeline running
[18:36:58][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[18:36:58][D][voice_assistant:563]: Event Type: 3
[18:36:58][D][voice_assistant:577]: STT started
[18:36:58][D][light:036]: 'top_led' Setting:
[18:36:58][D][light:051]:   Brightness: 100%
[18:36:58][D][light:059]:   Red: 100%, Green: 100%, Blue: 100%
[18:36:58][D][light:109]:   Effect: 'listening'
[18:36:58][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[18:36:58][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[18:36:58][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[18:36:58][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[18:36:58][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:36:58][D][voice_assistant:439]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[18:36:58][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[18:36:59][D][voice_assistant:563]: Event Type: 11
[18:36:59][D][voice_assistant:717]: Starting STT by VAD
[18:37:00][D][voice_assistant:563]: Event Type: 12
[18:37:00][D][voice_assistant:721]: STT by VAD end
[18:37:00][D][voice_assistant:439]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[18:37:00][D][voice_assistant:445]: Desired state set to AWAITING_RESPONSE
[18:37:00][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:37:00][D][voice_assistant:439]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[18:37:00][D][light:036]: 'top_led' Setting:
[18:37:00][D][light:051]:   Brightness: 70%
[18:37:00][D][light:059]:   Red: 0%, Green: 20%, Blue: 100%
[18:37:00][D][light:109]:   Effect: 'processing'
[18:37:00][D][esp-idf:000]: W (77294) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:37:00][D][esp-idf:000]: W (77297) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:37:00][D][esp-idf:000]: W (77300) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:37:00][D][esp-idf:000]: W (77304) AUDIO_ELEMENT: OUT-[i2s_in] AEL_IO_ABORT

[18:37:00][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:37:00][D][voice_assistant:439]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[18:37:00][D][voice_assistant:563]: Event Type: 4
[18:37:00][D][voice_assistant:591]: Speech recognised as: " What time is it?"
[18:37:00][D][voice_assistant:563]: Event Type: 5
[18:37:00][D][voice_assistant:596]: Intent started
[18:37:00][D][voice_assistant:563]: Event Type: 6
[18:37:00][D][voice_assistant:563]: Event Type: 7
[18:37:00][D][voice_assistant:619]: Response: "The current time is 18:37 Mountain Time on Wednesday, April 17, 2024."
[18:37:00][D][voice_assistant:563]: Event Type: 8
[18:37:00][D][voice_assistant:639]: Response URL: "http://10.19.15.100:8123/api/tts_proxy/491349b64163dcff9ed5d41242b6d89e5713fa8f_en-gb_010745e5ef_tts.piper.mp3"
[18:37:00][D][voice_assistant:439]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[18:37:00][D][voice_assistant:445]: Desired state set to STREAMING_RESPONSE
[18:37:00][D][media_player:059]: 'onju_media_player' - Setting
[18:37:00][D][media_player:066]:   Media URL: http://10.19.15.100:8123/api/tts_proxy/491349b64163dcff9ed5d41242b6d89e5713fa8f_en-gb_010745e5ef_tts.piper.mp3
[18:37:00][D][adf_media_player:030]: Got control call in state 1
[18:37:00][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:37:00][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:37:00][I][adf_media_player:135]: got new pipeline state: 1
[18:37:00][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:37:00][D][light:036]: 'top_led' Setting:
[18:37:00][D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[18:37:00][D][light:109]:   Effect: 'speaking'
[18:37:00][D][voice_assistant:563]: Event Type: 2
[18:37:00][D][voice_assistant:653]: Assist Pipeline ended
[18:37:00][D][esp-idf:000]: I (78129) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[18:37:00][D][esp-idf:000]: I (78132) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:37:00][D][esp_audio_sources:097]: Streamer status: 2
[18:37:00][D][esp_audio_sources:098]: decoder status: 2
[18:37:00][D][light:036]: 'top_led' Setting:
[18:37:00][D][light:051]:   Brightness: 60%
[18:37:00][D][light:059]:   Red: 100%, Green: 0%, Blue: 100%
[18:37:01][D][light:109]:   Effect: 'listening_ww'
[18:37:01][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[18:37:01][D][micro_wake_word:115]: Starting Microphone
[18:37:01][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:37:01][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:37:01][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[18:37:01][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:37:01][D][esp-idf:000]: I (78475) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8385715 Bytes, Inter:157852 Bytes, Dram:157852 Bytes

[18:37:01][D][esp-idf:000]: I (78477) AUDIO_ELEMENT: [i2s_in] AEL_MSG_CMD_RESUME,state:1

[18:37:01][D][esp-idf:000]: I (78481) AUDIO_PIPELINE: Pipeline started

[18:37:01][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 14
[18:37:01][I][esp_adf_pipeline:214]: [ i2s_in ] status: 14
[18:37:01][I][esp_adf_pipeline:214]: [ i2s_in ] status: 12
[18:37:01][D][esp_adf_pipeline:131]: Check element [i2s_in] status, 3
[18:37:01][D][esp_adf_pipeline:131]: Check element [pcm_reader] status, 3
[18:37:01][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:37:01][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[18:37:01][I][esp_adf_pipeline:214]: [ pcm_reader ] status: 12
[18:37:01][D][esp-idf:000]: I (78794) HTTP_CLIENT: Body received in fetch header state, 0x3fcc35bb, 1841

[18:37:01][D][esp-idf:000]: I (78801) HTTP_STREAM: total_bytes=60399

[18:37:01][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=16000, bits=16, ch=1
[18:37:01][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:37:01][D][esp-idf:000]: W (78874) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[18:37:01][D][esp-idf:000]: W (78879) MP3_DECODER: output aborted -3

[18:37:01][D][esp-idf:000]: I (78883) MP3_DECODER: Closed

[18:37:01][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:37:01][I][adf_media_player:135]: got new pipeline state: 2
[18:37:01][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:37:01][D][esp-idf:000]: I (78908) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8393243 Bytes, Inter:163268 Bytes, Dram:163268 Bytes

[18:37:01][D][esp-idf:000]: I (78911) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[18:37:01][D][esp-idf:000]: I (78914) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:37:01][D][esp-idf:000]: I (78917) AUDIO_ELEMENT: [i2s_out] AEL_MSG_CMD_RESUME,state:1
  1. When I play an mp3, it also works, but it is now sounding like darth vader (very slowed down, opposite problem of above).

Log:


[18:38:57][D][media_player:059]: 'onju_media_player' - Setting
[18:38:57][D][media_player:066]:   Media URL: http://10.19.15.100:8123/media/local/Auntie's%20Lock.mp3?authSig=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJmNzI3ZGIxODJhZTI0YzczOGU1MjE4MGQ0MzYzYTI2YSIsInBhdGgiOiIvbWVkaWEvbG9jYWwvQXVudGllJ3MgTG9jay5tcDMiLCJwYXJhbXMiOltdLCJpYXQiOjE3MTM0MDA3MzcsImV4cCI6MTcxMzQ4NzEzN30.4LuxjFgeQPIE9S50Im58at3Yg7aMm2NEcXk7fd-hCNY
[18:38:57][D][adf_media_player:030]: Got control call in state 1
[18:38:57][D][esp_adf_pipeline:050]: Starting request, current state STOPPED
[18:38:57][D][esp_adf_pipeline:302]: State changed from STOPPED to PREPARING
[18:38:57][I][adf_media_player:135]: got new pipeline state: 1
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:57][D][esp-idf:000]: I (194756) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[18:38:57][D][esp-idf:000]: I (194759) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:38:57][D][esp_audio_sources:097]: Streamer status: 2
[18:38:57][D][esp_audio_sources:098]: decoder status: 2
[18:38:57][D][esp-idf:000]: I (194790) HTTP_CLIENT: Body received in fetch header state, 0x3fcc2a22, 1738

[18:38:57][D][esp-idf:000]: I (194795) HTTP_STREAM: total_bytes=6564597

[18:38:57][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:57][D][esp-idf:000]: W (194926) AUDIO_ELEMENT: OUT-[decoder] AEL_IO_ABORT

[18:38:57][D][esp-idf:000]: W (194929) MP3_DECODER: output aborted -3

[18:38:57][D][esp-idf:000]: I (194933) MP3_DECODER: Closed

[18:38:57][D][esp_adf_pipeline:302]: State changed from PREPARING to STARTING
[18:38:57][I][adf_media_player:135]: got new pipeline state: 2
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:57][D][esp-idf:000]: I (194956) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8387635 Bytes, Inter:157868 Bytes, Dram:157868 Bytes

[18:38:57][D][esp-idf:000]: I (194959) AUDIO_ELEMENT: [http] AEL_MSG_CMD_RESUME,state:1

[18:38:57][D][esp-idf:000]: I (194961) AUDIO_ELEMENT: [decoder] AEL_MSG_CMD_RESUME,state:1

[18:38:57][D][esp-idf:000]: I (194964) MP3_DECODER: MP3 opened

[18:38:57][D][esp-idf:000]: I (195001) HTTP_CLIENT: Body received in fetch header state, 0x3fcc7446, 1738

[18:38:57][D][esp-idf:000]: I (195006) HTTP_STREAM: total_bytes=6564597

[18:38:57][I][esp_adf_pipeline:214]: [ decoder ] status: 14
[18:38:57][I][esp_adf_pipeline:214]: [ http ] status: 14
[18:38:57][I][esp_adf_pipeline:214]: [ i2s_out ] status: 12
[18:38:57][D][esp_adf_pipeline:131]: Check element [http] status, 3
[18:38:57][D][esp_adf_pipeline:131]: Check element [decoder] status, 3
[18:38:57][D][esp_adf_pipeline:131]: Check element [i2s_out] status, 3
[18:38:57][D][esp_adf_pipeline:302]: State changed from STARTING to RUNNING
[18:38:57][I][adf_media_player:135]: got new pipeline state: 3
[18:38:57][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:38:58][I][esp_adf_pipeline:214]: [ http ] status: 12
[18:38:58][I][esp_adf_pipeline:214]: [ decoder ] status: 12
[18:38:58][I][HTTPStreamReader:129]: [ * ] Receive music info from mp3 decoder, sample_rates=44100, bits=16, ch=2
[18:38:58][D][adf_i2s_out:127]: Set final i2s settings: 16000
[18:39:02][D][media_player:059]: 'onju_media_player' - Setting
[18:39:02][D][media_player:063]:   Command: STOP
[18:39:02][D][esp_adf_pipeline:302]: State changed from RUNNING to STOPPING
[18:39:02][I][adf_media_player:135]: got new pipeline state: 4
[18:39:03][D][esp_adf_pipeline:302]: State changed from STOPPING to STOPPED
[18:39:03][I][adf_media_player:135]: got new pipeline state: 5

Feels like perhaps the re-sampler is indeed the missing ingredient for the output, maybe ill try to put it just on that?

gnumpi commented 1 month ago

yes, sorry that is what I meant, for the output you definitely need it! But good to hear that you got success with the microphone!

cowboyrushforth commented 1 month ago

Woohoo! It works!

I will make a PR in the onju-voice repo with the full config after I test it a bit more.

Previously with this device w/ micro wake word there was no "media_player" component, so this is a huge upgrade - amazing work!!

One caveat, which I imagine other devices must have dealt with is now that you can talk and play music at the same time haha, how does micro wake word know when to stop listening? It actually did detect me correctly even with music playing thru it, but the timeout seemed very long. Anyways, another optimization for another day! :)

sqldiablo commented 1 month ago

This is awesome to hear. @cowboyrushforth , do you mind sharing your final config?

cowboyrushforth commented 1 month ago

https://github.com/cowboyrushforth/onju-voice-satellite/blob/esphome_audio/esphome/onju-voice-microwakeword.yaml

this is what I have so far, still playing with some things though

sqldiablo commented 1 month ago

That got me working! Thanks for your help, @gnumpi & @cowboyrushforth!

jherby2k commented 3 weeks ago

as I am thinking about it I have to check how it is working with the resampler, maybe also try without the resampler for the microphone, if you set it to 16kHz and 32bit it should be obsolete anyway.

Hi - i've been following this thread closely :)

Would you like a separate issue for the resampler and mic not working together? I'd really like to set the sample_rate to 48000 for better quality output, but then of course the mic doesn't work.

Thanks for your hard work!