esphome / firmware

Holds firmware configuration files for projects that the ESPHome team provides.
https://esphome.io/projects
Apache License 2.0
128 stars 92 forks source link

ESP-S3-BOX-3 Voice Assistant speaker volume too low #139

Open stalakerob opened 5 months ago

stalakerob commented 5 months ago

The speaker volume of the Box-3 Voice Assistant is very low. Even in a quiet room you can barely hear it. Increasing the volume_multiplier config setting does not change anything.

nickrout commented 5 months ago

Esp-s3-box (ie the non -3) is the same.

jaymunro commented 5 months ago

Volume on mine is quite acceptable for me. For comparison, I did an audio dB measurement at 50cm using the Alan TTS model in piper. Result at 50cm was 71dB max.

stalakerob commented 4 months ago

I recently installed the stock demo firmware to check the max volume. It allows you to play a few MP3 files. Increasing the volume makes the playback pretty loud. Much loader than with ESPHome. This means its not a hardware issue but a ESPHome issue.

letsautomatenet commented 4 months ago

Interestingly this is the same issue you get with the Sonoff TX Ultimate light switch when you flash that with ESPHome and use is as a media player / TTS.

Hopefully the ESPHome brains will figure it out.

sammcj commented 3 months ago

I have a esp-s3-box-3 with a very quiet speaker (using it with home assistant/esphome) - is there a standard way to set the volume on these?

Confusingly the volume_multiplier configuration parameter seems to be related to the microphone gain, rather than the volume of the speaker.

crudolphy commented 3 months ago

I have the same issue with both the esp-s3-box-3 and the atom m5 echo device. Even with my hearing aids in I can't hear the response. Hopefully someone is working on this.

sammcj commented 3 months ago

@cfrudolphy so I've got a bit of a dodgy hack for my specific use case with esphome and the esp-s3-box-3, not sure if you can adapt for your usage - https://github.com/sammcj/esphome-esp-s3-box-3-volume

crudolphy commented 3 months ago

Thanks for this. But first I have to get it adopted into the ESPHome add-on. Try tomorrow.

Chuck Rudolphy 346.383.6920 Mobile from my Pixel 6

On Mon, Mar 4, 2024, 5:37 PM Sam @.***> wrote:

@cfrudolphy https://github.com/cfrudolphy so I've got a bit of a dodgy hack for my specific use case with esphome and the esp-s3-box-3, not sure if you can adapt for your usage - https://github.com/sammcj/esphome-esp-s3-box-3-volume

— Reply to this email directly, view it on GitHub https://github.com/esphome/firmware/issues/139#issuecomment-1977655944, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZHFV6UKLCHULTQOISGMJ3YWUAS7AVCNFSM6AAAAABBMR6Y2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGY2TKOJUGQ . You are receiving this because you were mentioned.Message ID: @.***>

dmakovec commented 3 months ago

@sammcj thanks for hacking this out. I'm a bit unfamiliar with how types get cast, but reading your notes and inserting the output.set_level call into the esphome's on_boot sequence, I'm seeing the following. Any tips on what I might have missed? TIA!

 on_boot: 
    - priority: 600.0
      then: 
        - output.set_level: 

            ID 'speaker_volume' of type esp_box_volume::ESPBoxVolume doesn't inherit from output::FloatOutput. Please double check your ID is pointing to the correct value.
            id: speaker_volume
            level: 0.85
        - script.execute: 
        ..
sammcj commented 3 months ago

@dmakovec the volume code isn't in the upstream jesserockz/esp32-s3-box-3-board board repository, you'd have to use / adapt my fork linked above to be able to call esp_box_volume.

I opened a PR with @jesserockz to get it merged in - https://github.com/jesserockz/esp32-s3-box-3-board/pull/4

mihsu81 commented 2 months ago

@sammcj I have the same issue even after the PR has been merged.

  on_boot: 
     - priority: 600.0
       then: 
          - output.set_level: 

              ID 'speaker_volume' of type esp_box_volume::ESPBoxVolume doesn't inherit from output::FloatOutput. Please double check your ID is pointing to the correct value.
              id: speaker_volume
              level: 0.85
jaymunro commented 2 months ago

I have the same issue even after the PR has been merged.

@mihsu81 are you using release 2024.3.1 or dev of the ESPHome addon? The PR has not yet made its way there. I think you'd have to compile a plugin using Jessie's repo to use this or possibly link it in as an external component or something. Not sure how though. I'm sure I'll be one of many eagerly reviewing release notes on the next versions of ESPHome addon.

mihsu81 commented 2 months ago

I'm having the same issues with ESPHome 2024.4.0-dev. I've opened a new issue in @sammcj's repo https://github.com/sammcj/esphome-esp-s3-box-3-volume/issues/1.

cl0ud6uru commented 1 month ago

Any updates on this? So far everything works great but the speaker volume is almost impossible to hear. As mentioned above, the stock firmware out of the box was plenty loud. I've got the radar and battery sensor working, but I'm at a loss on volume control.

fire219 commented 1 month ago

Confirming it's still an issue. ESPHome 2024.4.2, have added the external_component for esp_box_volume and even switched the board component source to sammcj's:

components:
      - name: esp32_s3_box_3_board
        source: github://sammcj/esphome-esp-s3-box-3-volume@main
        refresh: always

Same applies on jesserocks' repo and/or without the external_component. Any permutation still gets the ID 'speaker_volume' of type esp_box_volume::ESPBoxVolume doesn't inherit from output::FloatOutput. Please double check your ID is pointing to the correct value. error.

After digging through the commits to jesserockz/esp32-s3-box-3-board#4, it looks like the syntax for setting the volume should be a bit different (eg no need for the esp_box_volume declaration and id), but I'm not smart enough to figure out how to use it. Trying to apply the volume level to the speaker (box_speaker in the HA voice assistant script) gives a similar error as before, but for esp_adf::ESPADFSpeaker.

hargcore commented 1 month ago

Same thing here I can't hear the spoken words on the speaker after the click - I am using Piper (Alan-Low) as my TTS response on my HA Voice Assistant but tried others. If I use the same TTS set up in the Services Dev Tools out of a speaker it plays the full audio end to end at normal volume. Latest version of HA and all add-ons installed and a fresh S3 install as of today.

cl0ud6uru commented 1 month ago

Same thing here I can't hear the spoken words on the speaker after the click - I am using Piper (Alan-Low) as my TTS response on my HA Voice Assistant but tried others. If I use the same TTS set up in the Services Dev Tools out of a speaker it plays the full audio end to end at normal volume. Latest version of HA and all add-ons installed and a fresh S3 install as of today.

Mine is no longer playing any sound after the last update. Are you getting sound at all?

hargcore commented 1 month ago

I discovered the "no sound at all" issue is related to the last update. The fix is here. So unrelated to this muted sound issue: https://github.com/esphome/issues/issues/5791#issue-2297438479

styphonthal commented 3 weeks ago

anyone have luck adjusting the sound?

Windyo commented 3 weeks ago

Neither. i tried a few things including using a lambda to directly call audio_board_set_volume from the board definition but no dice. I'd love for @sammcj or @jesserockz to give us a pointer on how to call this function because I've spent more than a few hours on this and I am lost.

sammcj commented 2 weeks ago

Sorry, way too many Github notifications to deal with.

Here's my esp-s3-box-3 esphome config if it helps, it gives me 3 volume settings in HA that I can select from.

image
---
#------------------------------------------------------------------------------------#
#     PIN Schematics                                                                 #
#                                                                                    #
#       GPIO-00 MCU-BOOT                                                             #
#       GPIO-01 Speaker Mute-Status                                                  #
#       GPIO-02 I2S MCLK                                                             #
#       GPIO-03 Touch-Screen TT21100 Interrupt Pin                                   #
#       GPIO-04 ILI92xxx Display DC-Pin (SPI: CLK-Pin)                               #
#       GPIO-05 ILI92xxx Display CS-Pin (SPI: MOSI-Pin)                              #
#       GPIO-06 ILI92xxx Display SDA                                                 #
#       GPIO-07 ILI92xxx Display SCK                                                 #
#       GPIO-17 I2S_SCLK                                                             #
#       GPIO-40 I2C_SCL (Temp & Hum) -- SENSOR v1.1 [AHT30]                          #
#       GPIO-41 I2C_SDA (Temp & Hum) -- SENSOR v1.1 [AHT30]                          #

substitutions:
  name: esp32-s3-box-3-5acf94
  friendly_name: "ESP32 S3 Box 3 5acf94"

  micro_wake_word_model: alexa
  # micro_wake_word_model: /config/hey_mycroft.json
  wake_word_engine_location: "On device"
  voice_assist_idle_phase_id: "1"
  voice_assist_listening_phase_id: "2"
  voice_assist_thinking_phase_id: "3"
  voice_assist_replying_phase_id: "4"
  voice_assist_not_ready_phase_id: "10"
  voice_assist_error_phase_id: "11"
  voice_assist_muted_phase_id: "12"

  # Request text parameters
  request_text_font_size: "14"
  request_text_start_x: "50"
  request_text_start_y: "25"
  request_text_max_width: "200"
  request_text_line_height: "20"
  request_text_max_lines: "2"

  # Response text parameters
  response_text_font_size: "16"
  response_text_start_x: "5"
  response_text_start_y: "160"
  response_text_max_line_length: "280"
  response_text_line_height: "20"
  response_text_max_lines: "4"

  # 320 x 240
  loading_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/loading.png
  idle_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/idle.png
  listening_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/listening.png
  thinking_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/thinking.png
  replying_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/replying.png
  error_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/error.png
  loading_illustration_background_color: "000000"
  idle_illustration_background_color: "000000"
  listening_illustration_background_color: "000000"
  thinking_illustration_background_color: "000000"
  replying_illustration_background_color: "000000"
  error_illustration_background_color: "000000"

  # These unqiue characters have been extracted from every test file of every language available on https://github.com/home-assistant/intents (14 March 2024)
  allowed_characters: " !#%'()+,-./0123456789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWYZ[]_abcdefghijklmnopqrstuvwxyz{|}°²³µ¿ÁÂÄÅÉÖÚßàáâãäåæçèéêëìíîðñòóôõöøùúûüýþāăąćčďĐđēėęěğĮįıļľŁłńňőřśšťũūůűųźŻżŽžơưșțΆΈΌΐΑΒΓΔΕΖΗΘΚΜΝΠΡΣΤΥΦάέήίαβγδεζηθικλμνξοπρςστυφχψωϊόύώАБВГДЕЖЗИКЛМНОПРСТУХЦЧШЪЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћאבגדהוזחטיכלםמןנסעפץצקרשת،ءآأإئابةتجحخدذرزسشصضطظعغفقكلمنهوىيٹپچڈکگںھہیےংকচতধনফবযরলশষস়ািু্చయలిెొ్ംഅആഇഈഉഎഓകഗങചജഞടഡണതദധനപഫബഭമയരറലളവശസഹാിീുൂെേൈ്ൺൻർൽൾაბგდევზთილმნოპრსტუფქყშჩცძჭხạảấầẩậắặẹẽếềểệỉịọỏốồổỗộớờởợụủứừửữựỳ—、一上不个中为主乾了些亮人任低佔何作供依侧係個側偵充光入全关冇冷几切到制前動區卧厅厨及口另右吊后吗启吸呀咗哪唔問啟嗎嘅嘛器圍在场執場外多大始安定客室家密寵对將小少左已帘常幫幾库度庫廊廚廳开式後恆感態成我戲戶户房所扇手打执把拔换掉控插摄整斯新明是景暗更最會有未本模機檯櫃欄次正氏水沒没洗活派温測源溫漏潮激濕灯為無煙照熱燈燥物狀玄现現瓦用發的盞目着睡私空窗立笛管節簾籬紅線红罐置聚聲脚腦腳臥色节著行衣解設調請謝警设调走路車车运連遊運過道邊部都量鎖锁門閂閉開關门闭除隱離電震霧面音頂題顏颜風风食餅餵가간감갔강개거게겨결경고공과관그금급기길깥꺼껐꼽나난내네놀누는능니다닫담대더데도동됐되된됨둡드든등디때떤뜨라래러렇렌려로료른를리림링마많명몇모무문물뭐바밝방배변보부불블빨뽑사산상색서설성세센션소쇼수스습시신실싱아안않알았애야어얼업없었에여연열옆오온완외왼요운움워원위으은을음의이인일임입있작잠장재전절정제져조족종주줄중줘지직진짐쪽차창천최추출충치침커컴켜켰쿠크키탁탄태탬터텔통트튼티파팬퍼폰표퓨플핑한함해했행혀현화활후휴힘,?"

external_components:
  - source: github://sammcj/esphome-esp-s3-box-3-volume@main
    components: [esp_box_volume]
    refresh: always
  - source: github://pr#5230
    components: esp_adf

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  platformio_options:
    board_build.flash_mode: dio
  project:
    name: esphome.voice-assistant
    version: "2.0"
  min_version: 2023.11.5
  on_boot:
    priority: 600
    then:
      - script.execute: draw_display
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
            - script.execute: draw_display
      - lambda: |-
          auto volume_component = id(speaker_volume);
          volume_component->set_volume(0.9);

esp32:
  board: esp32s3box
  flash_size: 16MB
  framework:
    type: esp-idf
    version: 4.4.6 # workaround for https://github.com/esphome/issues/issues/5791
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
      CONFIG_ESP32_S3_BOX_3_BOARD: "y"
    components:
      - name: esp32_s3_box_3_board
        source: github://sammcj/esp32-s3-box-3-board@main
        refresh: always

psram:
  mode: octal
  speed: 80MHz

ota:
logger:
  hardware_uart: USB_SERIAL_JTAG

esp_box_volume:
  id: speaker_volume

api:
  encryption:
    key: REDACTED
  on_client_connected:
    - script.execute: draw_display
  on_client_disconnected:
    - script.execute: draw_display

text_sensor:
  - platform: template
    id: text_request
    name: "Request Text"

  - platform: template
    id: text_response
    name: "Response Text"

voice_assistant:
  volume_multiplier: 2.0
  id: va
  microphone: box_mic
  speaker: box_speaker
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        brightness: 100%
        effect: pulse
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: draw_display
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: draw_display
  on_stt_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - lambda: id(text_request).publish_state(x);
    - script.execute: draw_display
  on_tts_start:
    - lambda: id(text_response).publish_state(x);
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: draw_display
  on_tts_stream_start:
    - light.turn_on:
        id: led
        brightness: 100%
        effect: pulse
    - wait_until:
        speaker.is_playing:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: draw_display
  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: draw_display
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
    - if:
        condition:
          and:
            - switch.is_off: mute
            - lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start:
    - delay: 1s
    - lambda: id(text_response).publish_state("");
    - lambda: id(text_request).publish_state("");
  on_error:
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
          - script.execute: draw_display
          - delay: 1s
          - if:
              condition:
                switch.is_off: mute
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: draw_display
    - light.turn_on:
        id: led
        brightness: 90%
        effect: strobe
    - delay: 2s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

  on_client_connected:
    - if:
        condition:
          switch.is_off: mute
        then:
          - wait_until:
              not: ble.enabled
          - if:
              condition:
                lambda: return id(wake_word_engine_location).state == "In Home Assistant";
              then:
                - lambda: id(va).set_use_wake_word(true);
                - voice_assistant.start_continuous:
          - if:
              condition:
                lambda: return id(wake_word_engine_location).state == "On device";
              then:
                - micro_wake_word.start
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
    - lambda: id(init_in_progress) = false;
    - script.execute: draw_display
  on_client_disconnected:
    - if:
        condition:
          lambda: return id(wake_word_engine_location).state == "In Home Assistant";
        then:
          - lambda: id(va).set_use_wake_word(false);
          - voice_assistant.stop:
    - if:
        condition:
          lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - micro_wake_word.stop
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - script.execute: draw_display

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  ap:
  on_connect:
    - script.execute: draw_display
    - delay: 5s # Gives time for improv results to be transmitted
    - ble.disable:
  on_disconnect:
    - script.execute: draw_display
    - ble.enable:

dashboard_import:
  package_import_url: github://esphome/firmware/wake-word-voice-assistant/esp32-s3-box-3.yaml@main

improv_serial:

esp32_improv:
  authorizer: none

number:
  - platform: template
    name: "Presence duration"
    id: radar_delayed_off
    icon: mdi:account-clock
    optimistic: true
    restore_value: true
    initial_value: 60
    min_value: 0
    step: 5
    max_value: 300
    unit_of_measurement: s
    entity_category: config
    mode: box

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO1
      inverted: true
    name: "Mute"
    disabled_by_default: false
    entity_category: diagnostic

  - platform: gpio
    pin:
      number: GPIO21
    name: "Presence detect"
    disabled_by_default: false
    device_class: "occupancy"
    filters:
      - delayed_off: !lambda return id(radar_delayed_off).state * 1000;
    on_release:
      then:
        - if:
            condition:
              switch.is_on: mute_when_absent
            then:
              - switch.turn_on: mute
              - light.turn_off: led
    on_press:
      then:
        - if:
            condition:
              switch.is_on: mute_when_absent
            then:
              - switch.turn_off: mute
              - light.turn_on: led

  - platform: gpio
    pin:
      number: GPIO0
      mode: INPUT_PULLUP
      inverted: true
    name: Top Left Button
    disabled_by_default: true
    entity_category: diagnostic
    on_multi_click:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset
  - platform: restart
    name: "Restart Device"
    entity_category: "diagnostic"
  - platform: shutdown
    name: "Shutdown Device"
    entity_category: "diagnostic"

font:
  - file:
      type: gfonts
      family: Figtree
      weight: 300
      italic: true
    glyphs: ${allowed_characters}
    id: font_request
    size: ${request_text_font_size}
  - file:
      type: gfonts
      family: Figtree
      weight: 300
    glyphs: ${allowed_characters}
    id: font_response
    size: ${response_text_font_size}

sensor:
  - platform: internal_temperature
    name: "Internal Temperature"
    entity_category: "diagnostic"

  - platform: adc
    pin: GPIO10
    name: "Battery voltage"
    id: battery_voltage
    unit_of_measurement: "V"
    accuracy_decimals: 3
    device_class: "voltage"
    entity_category: "diagnostic"
    disabled_by_default: true
    update_interval: 300s
    attenuation: auto
    filters:
      - multiply: 4.01

  - platform: copy
    source_id: battery_voltage
    name: "Battery level"
    unit_of_measurement: "%"
    accuracy_decimals: 0
    device_class: "battery"
    entity_category: "diagnostic"
    filters:
      - lambda: return (x - 3.1) / (4.14 - 3.1) * 100;
      - clamp:
          min_value: 0
          max_value: 100
          ignore_out_of_range: true

output:
  - platform: ledc
    pin: GPIO47
    id: backlight_output

light:
  - platform: monochromatic
    id: led
    name: LCD Backlight
    entity_category: config
    output: backlight_output
    restore_mode: RESTORE_DEFAULT_ON
    default_transition_length: 250ms

    effects:
      - pulse:
          transition_length: 650ms
          update_interval: 650ms
          min_brightness: 70%
          max_brightness: 100%
      - pulse:
          name: Fast Pulse
          transition_length: 50ms
          update_interval: 50ms
      - pulse:
          name: Slow Pulse
          transition_length: 1000ms
          update_interval: 1000ms
          min_brightness: 85%
          max_brightness: 100%
      - strobe:
      - strobe:
          name: Strobe Effect With Custom Values
          colors:
            - state: true
              brightness: 100%
              red: 100%
              green: 90%
              blue: 0%
              duration: 600ms
            - state: false
              duration: 250ms
            - state: true
              brightness: 100%
              red: 0%
              green: 100%
              blue: 0%
              duration: 600ms

esp_adf:
  board: esp32s3box3

microphone:
  - platform: esp_adf
    id: box_mic

speaker:
  - platform: esp_adf
    id: box_speaker

micro_wake_word:
  model: ${micro_wake_word_model}
  on_wake_word_detected:
    - voice_assistant.start

script:
  - id: draw_display
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                  wifi.connected:
                then:
                  - if:
                      condition:
                        api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(s3_box_lcd).show_page(listening_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(s3_box_lcd).show_page(thinking_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(s3_box_lcd).show_page(replying_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(s3_box_lcd).show_page(error_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(s3_box_lcd).show_page(muted_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(s3_box_lcd).show_page(no_ha_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_idle_phase_id}:
                                id(s3_box_lcd).show_page(idle_page);
                                id(s3_box_lcd).update();
                                break;
                              default:
                                id(s3_box_lcd).show_page(idle_page);
                                id(s3_box_lcd).update();
                            }
                      else:
                        - display.page.show: no_ha_page
                        - component.update: s3_box_lcd
                else:
                  - display.page.show: no_wifi_page
                  - component.update: s3_box_lcd
          else:
            - display.page.show: initializing_page
            - component.update: s3_box_lcd
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led
                brightness: 75%
                effect: none
            - delay: 40s
            - light.turn_on:
                id: led
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led

switch:
  - platform: template
    name: Display conversation
    id: display_conversation
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config

  - platform: template
    name: "Set Volume to 80%"
    optimistic: true
    on_turn_on:
      - lambda: id(speaker_volume).set_volume(0.80);

  - platform: template
    name: "Set Volume to 85%"
    optimistic: true
    on_turn_on:
      - lambda: id(speaker_volume).set_volume(0.85);

  - platform: template
    name: "Set Volume to 90%"
    optimistic: true
    on_turn_on:
      - lambda: id(speaker_volume).set_volume(0.90);

  - platform: template
    name: "Mute when absent"
    id: mute_when_absent
    icon: mdi:account-right-arrow
    optimistic: true
    entity_category: config
    restore_mode: RESTORE_DEFAULT_OFF

  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - wait_until:
          not:
            voice_assistant.is_running
      - script.execute: reset_led

  - platform: template
    name: Mute
    id: mute
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - if:
                      condition:
                        lambda: return id(wake_word_engine_location).state == "In Home Assistant";
                      then:
                        - lambda: id(va).set_use_wake_word(true);
                        - voice_assistant.start_continuous
                  - if:
                      condition:
                        lambda: return id(wake_word_engine_location).state == "On device";
                      then:
                        - micro_wake_word.start
            - script.execute: draw_display
    on_turn_on:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - lambda: id(va).set_use_wake_word(false);
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: draw_display

select:
  - platform: template
    entity_category: config
    name: Wake word engine location
    id: wake_word_engine_location
    optimistic: true
    restore_value: true
    options:
      - In Home Assistant
      - On device
    initial_option: On device
    on_value:
      - wait_until:
          lambda: return id(voice_assistant_phase) == ${voice_assist_muted_phase_id} || id(voice_assistant_phase) == ${voice_assist_idle_phase_id};
      - if:
          condition:
            lambda: return x == "In Home Assistant";
          then:
            - micro_wake_word.stop
            - delay: 500ms
            - if:
                condition:
                  switch.is_off: mute
                then:
                  - lambda: id(va).set_use_wake_word(true);
                  - voice_assistant.start_continuous:
      - if:
          condition:
            lambda: return x == "On device";
          then:
            - lambda: id(va).set_use_wake_word(false);
            - voice_assistant.stop
            - delay: 500ms
            - micro_wake_word.start

globals:
  - id: init_in_progress
    type: bool
    restore_value: false
    initial_value: "true"
  - id: voice_assistant_phase
    type: int
    restore_value: false
    initial_value: ${voice_assist_not_ready_phase_id}

image:
  - file: ${error_illustration_file}
    id: ollama_error
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${idle_illustration_file}
    id: ollama_idle
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${listening_illustration_file}
    id: ollama_listening
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${thinking_illustration_file}
    id: ollama_thinking
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${replying_illustration_file}
    id: ollama_replying
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${loading_illustration_file}
    id: ollama_initializing
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: https://github.com/esphome/firmware/raw/main/voice-assistant/error_box_illustrations/error-no-wifi.png
    id: error_no_wifi
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: https://github.com/esphome/firmware/raw/main/voice-assistant/error_box_illustrations/error-no-ha.png
    id: error_no_ha
    resize: 320x240
    type: RGB24
    use_transparency: true

color:
  - id: idle_color
    hex: ${idle_illustration_background_color}
  - id: listening_color
    hex: ${listening_illustration_background_color}
  - id: thinking_color
    hex: ${thinking_illustration_background_color}
  - id: replying_color
    hex: ${replying_illustration_background_color}
  - id: loading_color
    hex: ${loading_illustration_background_color}
  - id: error_color
    hex: ${error_illustration_background_color}

spi:
  clk_pin: 7
  mosi_pin: 6

display:
  - platform: ili9xxx
    id: s3_box_lcd
    model: S3BOX
    data_rate: 40MHz
    cs_pin: 5
    dc_pin: 4
    reset_pin:
      number: 48
      inverted: true
    update_interval: never
    pages:
      - id: idle_page
        lambda: |-
          it.fill(id(idle_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_idle), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            // it.filled_rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            // it.rectangle(20 , 20 , 280 , 30 , Color::WHITE );
            // it.printf(30, 25, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());
          }
      - id: listening_page
        lambda: |-
          it.fill(id(listening_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_listening), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            // it.filled_rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            // it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            it.printf(30, 25, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());
          }
      - id: thinking_page
        lambda: |-
          it.fill(id(thinking_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_thinking), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            // it.filled_rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            // it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            it.printf(30, 25, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());
          }
      - id: replying_page
        lambda: |-
          it.printf(${request_text_start_x}, ${request_text_start_y}, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());

          int x = ${response_text_start_x};
          int y = ${response_text_start_y};
          int line_height = ${response_text_line_height};
          int max_lines = ${response_text_max_lines};
          int max_line_length = ${response_text_max_line_length};
          int line_count = 0;
          int line_length = 0;

          it.fill(id(replying_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_replying), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            it.printf(x, y, id(font_response), Color::WHITE, "%s", id(text_response).state.c_str());

            // split the response into lines
            std::string response = id(text_response).state.c_str();
            std::string line;

            for (int i = 0; i < response.length(); i++) {

              if (response[i] == ' ' || response[i] == '\n') {
                if (line_length + 1 > max_line_length) {
                  // draw a black box behind the text at the current y position
                  it.filled_rectangle(x, y, it.get_width() - x, line_height, Color::BLACK);
                  it.printf(x, y, id(font_response), Color::WHITE, "%s", line.c_str());
                  line = "";
                  y += line_height;
                  line_count++;
                  line_length = 0;
                  if (line_count >= max_lines) {
                    break;
                  }
                } else {
                  line += response[i];
                  line_length++;
                }
              } else {
                line += response[i];
                line_length++;
              }
            }
          }
      - id: error_page
        lambda: |-
          it.fill(id(error_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_error), ImageAlign::CENTER);
      - id: no_ha_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_ha), ImageAlign::CENTER);
          // it.printf(10, 10, id(font_request), Color::WHITE, "IP: %s", id(wifi_info).state.c_str());

      - id: no_wifi_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_wifi), ImageAlign::CENTER);
      - id: initializing_page
        lambda: |-
          it.fill(id(loading_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_initializing), ImageAlign::CENTER);
      - id: muted_page
        lambda: |-
          it.fill(Color::BLACK);
          // if (id(display_conversation).state) {
          //   it.printf(10, 10, id(font_request), Color::WHITE, "Muted");
          //   it.printf(10, 30, id(font_request), Color::WHITE, "Press the button to talk");
          // }
Windyo commented 2 weeks ago

Thank you very much @sammcj .

I'm far form an expert but the minimum version of what you pasted that works for me is pasted below. I kept the esp_adf component despite not using the sensor base because it allows me to do Display shenanigans.

For anyone that just wants to read this comment and ignore the chain:

substitutions:
  name: esp32-s3-box-3-05afa4
  friendly_name: ESP32 S3 Box 3 05afa4
  micro_wake_word_model: hey_jarvis
packages:
  esphome.voice-assistant: github://esphome/firmware/wake-word-voice-assistant/esp32-s3-box-3.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
  on_boot:
    then:
      - lambda: |-
          auto volume_component = id(speaker_volume);
          volume_component->set_volume(0.9);
external_components:
  - source: github://sammcj/esphome-esp-s3-box-3-volume@main
    components: [esp_box_volume]
    refresh: always
  - source: github://pr#5230
    components: esp_adf
api:
  encryption:
    key: !secret apikey_box3

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

voice_assistant:
  volume_multiplier: 4
  microphone: box_mic
  speaker: box_speaker
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  vad_threshold: 3

esp_box_volume:
  id: speaker_volume
Joakim-T commented 2 weeks ago

Sorry, way too many Github notifications to deal with.

Here's my esp-s3-box-3 esphome config if it helps, it gives me 3 volume settings in HA that I can select from.

image
---
#------------------------------------------------------------------------------------#
#     PIN Schematics                                                                 #
#                                                                                    #
#       GPIO-00 MCU-BOOT                                                             #
#       GPIO-01 Speaker Mute-Status                                                  #
#       GPIO-02 I2S MCLK                                                             #
#       GPIO-03 Touch-Screen TT21100 Interrupt Pin                                   #
#       GPIO-04 ILI92xxx Display DC-Pin (SPI: CLK-Pin)                               #
#       GPIO-05 ILI92xxx Display CS-Pin (SPI: MOSI-Pin)                              #
#       GPIO-06 ILI92xxx Display SDA                                                 #
#       GPIO-07 ILI92xxx Display SCK                                                 #
#       GPIO-17 I2S_SCLK                                                             #
#       GPIO-40 I2C_SCL (Temp & Hum) -- SENSOR v1.1 [AHT30]                          #
#       GPIO-41 I2C_SDA (Temp & Hum) -- SENSOR v1.1 [AHT30]                          #

substitutions:
  name: esp32-s3-box-3-5acf94
  friendly_name: "ESP32 S3 Box 3 5acf94"

  micro_wake_word_model: alexa
  # micro_wake_word_model: /config/hey_mycroft.json
  wake_word_engine_location: "On device"
  voice_assist_idle_phase_id: "1"
  voice_assist_listening_phase_id: "2"
  voice_assist_thinking_phase_id: "3"
  voice_assist_replying_phase_id: "4"
  voice_assist_not_ready_phase_id: "10"
  voice_assist_error_phase_id: "11"
  voice_assist_muted_phase_id: "12"

  # Request text parameters
  request_text_font_size: "14"
  request_text_start_x: "50"
  request_text_start_y: "25"
  request_text_max_width: "200"
  request_text_line_height: "20"
  request_text_max_lines: "2"

  # Response text parameters
  response_text_font_size: "16"
  response_text_start_x: "5"
  response_text_start_y: "160"
  response_text_max_line_length: "280"
  response_text_line_height: "20"
  response_text_max_lines: "4"

  # 320 x 240
  loading_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/loading.png
  idle_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/idle.png
  listening_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/listening.png
  thinking_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/thinking.png
  replying_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/replying.png
  error_illustration_file: https://github.com/sammcj/home-assistant-s3-box-community-illustrations/raw/main/sammcj/illustrations/error.png
  loading_illustration_background_color: "000000"
  idle_illustration_background_color: "000000"
  listening_illustration_background_color: "000000"
  thinking_illustration_background_color: "000000"
  replying_illustration_background_color: "000000"
  error_illustration_background_color: "000000"

  # These unqiue characters have been extracted from every test file of every language available on https://github.com/home-assistant/intents (14 March 2024)
  allowed_characters: " !#%'()+,-./0123456789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWYZ[]_abcdefghijklmnopqrstuvwxyz{|}°²³µ¿ÁÂÄÅÉÖÚßàáâãäåæçèéêëìíîðñòóôõöøùúûüýþāăąćčďĐđēėęěğĮįıļľŁłńňőřśšťũūůűųźŻżŽžơưșțΆΈΌΐΑΒΓΔΕΖΗΘΚΜΝΠΡΣΤΥΦάέήίαβγδεζηθικλμνξοπρςστυφχψωϊόύώАБВГДЕЖЗИКЛМНОПРСТУХЦЧШЪЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяёђєіїјљњћאבגדהוזחטיכלםמןנסעפץצקרשת،ءآأإئابةتجحخدذرزسشصضطظعغفقكلمنهوىيٹپچڈکگںھہیےংকচতধনফবযরলশষস়ািু্చయలిెొ్ംഅആഇഈഉഎഓകഗങചജഞടഡണതദധനപഫബഭമയരറലളവശസഹാിീുൂെേൈ്ൺൻർൽൾაბგდევზთილმნოპრსტუფქყშჩცძჭხạảấầẩậắặẹẽếềểệỉịọỏốồổỗộớờởợụủứừửữựỳ—、一上不个中为主乾了些亮人任低佔何作供依侧係個側偵充光入全关冇冷几切到制前動區卧厅厨及口另右吊后吗启吸呀咗哪唔問啟嗎嘅嘛器圍在场執場外多大始安定客室家密寵对將小少左已帘常幫幾库度庫廊廚廳开式後恆感態成我戲戶户房所扇手打执把拔换掉控插摄整斯新明是景暗更最會有未本模機檯櫃欄次正氏水沒没洗活派温測源溫漏潮激濕灯為無煙照熱燈燥物狀玄现現瓦用發的盞目着睡私空窗立笛管節簾籬紅線红罐置聚聲脚腦腳臥色节著行衣解設調請謝警设调走路車车运連遊運過道邊部都量鎖锁門閂閉開關门闭除隱離電震霧面音頂題顏颜風风食餅餵가간감갔강개거게겨결경고공과관그금급기길깥꺼껐꼽나난내네놀누는능니다닫담대더데도동됐되된됨둡드든등디때떤뜨라래러렇렌려로료른를리림링마많명몇모무문물뭐바밝방배변보부불블빨뽑사산상색서설성세센션소쇼수스습시신실싱아안않알았애야어얼업없었에여연열옆오온완외왼요운움워원위으은을음의이인일임입있작잠장재전절정제져조족종주줄중줘지직진짐쪽차창천최추출충치침커컴켜켰쿠크키탁탄태탬터텔통트튼티파팬퍼폰표퓨플핑한함해했행혀현화활후휴힘,?"

external_components:
  - source: github://sammcj/esphome-esp-s3-box-3-volume@main
    components: [esp_box_volume]
    refresh: always
  - source: github://pr#5230
    components: esp_adf

esphome:
  name: ${name}
  friendly_name: ${friendly_name}
  name_add_mac_suffix: false
  platformio_options:
    board_build.flash_mode: dio
  project:
    name: esphome.voice-assistant
    version: "2.0"
  min_version: 2023.11.5
  on_boot:
    priority: 600
    then:
      - script.execute: draw_display
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
            - script.execute: draw_display
      - lambda: |-
          auto volume_component = id(speaker_volume);
          volume_component->set_volume(0.9);

esp32:
  board: esp32s3box
  flash_size: 16MB
  framework:
    type: esp-idf
    version: 4.4.6 # workaround for https://github.com/esphome/issues/issues/5791
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
      CONFIG_ESP32_S3_BOX_3_BOARD: "y"
    components:
      - name: esp32_s3_box_3_board
        source: github://sammcj/esp32-s3-box-3-board@main
        refresh: always

psram:
  mode: octal
  speed: 80MHz

ota:
logger:
  hardware_uart: USB_SERIAL_JTAG

esp_box_volume:
  id: speaker_volume

api:
  encryption:
    key: REDACTED
  on_client_connected:
    - script.execute: draw_display
  on_client_disconnected:
    - script.execute: draw_display

text_sensor:
  - platform: template
    id: text_request
    name: "Request Text"

  - platform: template
    id: text_response
    name: "Response Text"

voice_assistant:
  volume_multiplier: 2.0
  id: va
  microphone: box_mic
  speaker: box_speaker
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  vad_threshold: 3
  on_listening:
    - light.turn_on:
        id: led
        brightness: 100%
        effect: pulse
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: draw_display
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: draw_display
  on_stt_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - lambda: id(text_request).publish_state(x);
    - script.execute: draw_display
  on_tts_start:
    - lambda: id(text_response).publish_state(x);
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: draw_display
  on_tts_stream_start:
    - light.turn_on:
        id: led
        brightness: 100%
        effect: pulse
    - wait_until:
        speaker.is_playing:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: draw_display
  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: draw_display
  on_end:
    - delay: 100ms
    - wait_until:
        not:
          speaker.is_playing:
    - script.execute: reset_led
    - if:
        condition:
          and:
            - switch.is_off: mute
            - lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start:
    - delay: 1s
    - lambda: id(text_response).publish_state("");
    - lambda: id(text_request).publish_state("");
  on_error:
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
          - script.execute: draw_display
          - delay: 1s
          - if:
              condition:
                switch.is_off: mute
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: draw_display
    - light.turn_on:
        id: led
        brightness: 90%
        effect: strobe
    - delay: 2s
    - script.execute: reset_led
    - script.wait: reset_led
    - lambda: |-
        if (code == "wake-provider-missing" || code == "wake-engine-missing") {
          id(use_wake_word).turn_off();
        }

  on_client_connected:
    - if:
        condition:
          switch.is_off: mute
        then:
          - wait_until:
              not: ble.enabled
          - if:
              condition:
                lambda: return id(wake_word_engine_location).state == "In Home Assistant";
              then:
                - lambda: id(va).set_use_wake_word(true);
                - voice_assistant.start_continuous:
          - if:
              condition:
                lambda: return id(wake_word_engine_location).state == "On device";
              then:
                - micro_wake_word.start
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
    - lambda: id(init_in_progress) = false;
    - script.execute: draw_display
  on_client_disconnected:
    - if:
        condition:
          lambda: return id(wake_word_engine_location).state == "In Home Assistant";
        then:
          - lambda: id(va).set_use_wake_word(false);
          - voice_assistant.stop:
    - if:
        condition:
          lambda: return id(wake_word_engine_location).state == "On device";
        then:
          - micro_wake_word.stop
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - script.execute: draw_display

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  ap:
  on_connect:
    - script.execute: draw_display
    - delay: 5s # Gives time for improv results to be transmitted
    - ble.disable:
  on_disconnect:
    - script.execute: draw_display
    - ble.enable:

dashboard_import:
  package_import_url: github://esphome/firmware/wake-word-voice-assistant/esp32-s3-box-3.yaml@main

improv_serial:

esp32_improv:
  authorizer: none

number:
  - platform: template
    name: "Presence duration"
    id: radar_delayed_off
    icon: mdi:account-clock
    optimistic: true
    restore_value: true
    initial_value: 60
    min_value: 0
    step: 5
    max_value: 300
    unit_of_measurement: s
    entity_category: config
    mode: box

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO1
      inverted: true
    name: "Mute"
    disabled_by_default: false
    entity_category: diagnostic

  - platform: gpio
    pin:
      number: GPIO21
    name: "Presence detect"
    disabled_by_default: false
    device_class: "occupancy"
    filters:
      - delayed_off: !lambda return id(radar_delayed_off).state * 1000;
    on_release:
      then:
        - if:
            condition:
              switch.is_on: mute_when_absent
            then:
              - switch.turn_on: mute
              - light.turn_off: led
    on_press:
      then:
        - if:
            condition:
              switch.is_on: mute_when_absent
            then:
              - switch.turn_off: mute
              - light.turn_on: led

  - platform: gpio
    pin:
      number: GPIO0
      mode: INPUT_PULLUP
      inverted: true
    name: Top Left Button
    disabled_by_default: true
    entity_category: diagnostic
    on_multi_click:
      - timing:
          - ON for at least 10s
        then:
          - button.press: factory_reset_btn

button:
  - platform: factory_reset
    id: factory_reset_btn
    name: Factory reset
  - platform: restart
    name: "Restart Device"
    entity_category: "diagnostic"
  - platform: shutdown
    name: "Shutdown Device"
    entity_category: "diagnostic"

font:
  - file:
      type: gfonts
      family: Figtree
      weight: 300
      italic: true
    glyphs: ${allowed_characters}
    id: font_request
    size: ${request_text_font_size}
  - file:
      type: gfonts
      family: Figtree
      weight: 300
    glyphs: ${allowed_characters}
    id: font_response
    size: ${response_text_font_size}

sensor:
  - platform: internal_temperature
    name: "Internal Temperature"
    entity_category: "diagnostic"

  - platform: adc
    pin: GPIO10
    name: "Battery voltage"
    id: battery_voltage
    unit_of_measurement: "V"
    accuracy_decimals: 3
    device_class: "voltage"
    entity_category: "diagnostic"
    disabled_by_default: true
    update_interval: 300s
    attenuation: auto
    filters:
      - multiply: 4.01

  - platform: copy
    source_id: battery_voltage
    name: "Battery level"
    unit_of_measurement: "%"
    accuracy_decimals: 0
    device_class: "battery"
    entity_category: "diagnostic"
    filters:
      - lambda: return (x - 3.1) / (4.14 - 3.1) * 100;
      - clamp:
          min_value: 0
          max_value: 100
          ignore_out_of_range: true

output:
  - platform: ledc
    pin: GPIO47
    id: backlight_output

light:
  - platform: monochromatic
    id: led
    name: LCD Backlight
    entity_category: config
    output: backlight_output
    restore_mode: RESTORE_DEFAULT_ON
    default_transition_length: 250ms

    effects:
      - pulse:
          transition_length: 650ms
          update_interval: 650ms
          min_brightness: 70%
          max_brightness: 100%
      - pulse:
          name: Fast Pulse
          transition_length: 50ms
          update_interval: 50ms
      - pulse:
          name: Slow Pulse
          transition_length: 1000ms
          update_interval: 1000ms
          min_brightness: 85%
          max_brightness: 100%
      - strobe:
      - strobe:
          name: Strobe Effect With Custom Values
          colors:
            - state: true
              brightness: 100%
              red: 100%
              green: 90%
              blue: 0%
              duration: 600ms
            - state: false
              duration: 250ms
            - state: true
              brightness: 100%
              red: 0%
              green: 100%
              blue: 0%
              duration: 600ms

esp_adf:
  board: esp32s3box3

microphone:
  - platform: esp_adf
    id: box_mic

speaker:
  - platform: esp_adf
    id: box_speaker

micro_wake_word:
  model: ${micro_wake_word_model}
  on_wake_word_detected:
    - voice_assistant.start

script:
  - id: draw_display
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                  wifi.connected:
                then:
                  - if:
                      condition:
                        api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(s3_box_lcd).show_page(listening_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(s3_box_lcd).show_page(thinking_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(s3_box_lcd).show_page(replying_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(s3_box_lcd).show_page(error_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(s3_box_lcd).show_page(muted_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(s3_box_lcd).show_page(no_ha_page);
                                id(s3_box_lcd).update();
                                break;
                              case ${voice_assist_idle_phase_id}:
                                id(s3_box_lcd).show_page(idle_page);
                                id(s3_box_lcd).update();
                                break;
                              default:
                                id(s3_box_lcd).show_page(idle_page);
                                id(s3_box_lcd).update();
                            }
                      else:
                        - display.page.show: no_ha_page
                        - component.update: s3_box_lcd
                else:
                  - display.page.show: no_wifi_page
                  - component.update: s3_box_lcd
          else:
            - display.page.show: initializing_page
            - component.update: s3_box_lcd
  - id: reset_led
    then:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - light.turn_on:
                id: led
                brightness: 75%
                effect: none
            - delay: 40s
            - light.turn_on:
                id: led
                brightness: 25%
                effect: none
          else:
            - light.turn_off: led

switch:
  - platform: template
    name: Display conversation
    id: display_conversation
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config

  - platform: template
    name: "Set Volume to 80%"
    optimistic: true
    on_turn_on:
      - lambda: id(speaker_volume).set_volume(0.80);

  - platform: template
    name: "Set Volume to 85%"
    optimistic: true
    on_turn_on:
      - lambda: id(speaker_volume).set_volume(0.85);

  - platform: template
    name: "Set Volume to 90%"
    optimistic: true
    on_turn_on:
      - lambda: id(speaker_volume).set_volume(0.90);

  - platform: template
    name: "Mute when absent"
    id: mute_when_absent
    icon: mdi:account-right-arrow
    optimistic: true
    entity_category: config
    restore_mode: RESTORE_DEFAULT_OFF

  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
      - script.execute: reset_led
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);
      - wait_until:
          not:
            voice_assistant.is_running
      - script.execute: reset_led

  - platform: template
    name: Mute
    id: mute
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - if:
                      condition:
                        lambda: return id(wake_word_engine_location).state == "In Home Assistant";
                      then:
                        - lambda: id(va).set_use_wake_word(true);
                        - voice_assistant.start_continuous
                  - if:
                      condition:
                        lambda: return id(wake_word_engine_location).state == "On device";
                      then:
                        - micro_wake_word.start
            - script.execute: draw_display
    on_turn_on:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - lambda: id(va).set_use_wake_word(false);
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: draw_display

select:
  - platform: template
    entity_category: config
    name: Wake word engine location
    id: wake_word_engine_location
    optimistic: true
    restore_value: true
    options:
      - In Home Assistant
      - On device
    initial_option: On device
    on_value:
      - wait_until:
          lambda: return id(voice_assistant_phase) == ${voice_assist_muted_phase_id} || id(voice_assistant_phase) == ${voice_assist_idle_phase_id};
      - if:
          condition:
            lambda: return x == "In Home Assistant";
          then:
            - micro_wake_word.stop
            - delay: 500ms
            - if:
                condition:
                  switch.is_off: mute
                then:
                  - lambda: id(va).set_use_wake_word(true);
                  - voice_assistant.start_continuous:
      - if:
          condition:
            lambda: return x == "On device";
          then:
            - lambda: id(va).set_use_wake_word(false);
            - voice_assistant.stop
            - delay: 500ms
            - micro_wake_word.start

globals:
  - id: init_in_progress
    type: bool
    restore_value: false
    initial_value: "true"
  - id: voice_assistant_phase
    type: int
    restore_value: false
    initial_value: ${voice_assist_not_ready_phase_id}

image:
  - file: ${error_illustration_file}
    id: ollama_error
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${idle_illustration_file}
    id: ollama_idle
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${listening_illustration_file}
    id: ollama_listening
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${thinking_illustration_file}
    id: ollama_thinking
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${replying_illustration_file}
    id: ollama_replying
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: ${loading_illustration_file}
    id: ollama_initializing
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: https://github.com/esphome/firmware/raw/main/voice-assistant/error_box_illustrations/error-no-wifi.png
    id: error_no_wifi
    resize: 320x240
    type: RGB24
    use_transparency: true
  - file: https://github.com/esphome/firmware/raw/main/voice-assistant/error_box_illustrations/error-no-ha.png
    id: error_no_ha
    resize: 320x240
    type: RGB24
    use_transparency: true

color:
  - id: idle_color
    hex: ${idle_illustration_background_color}
  - id: listening_color
    hex: ${listening_illustration_background_color}
  - id: thinking_color
    hex: ${thinking_illustration_background_color}
  - id: replying_color
    hex: ${replying_illustration_background_color}
  - id: loading_color
    hex: ${loading_illustration_background_color}
  - id: error_color
    hex: ${error_illustration_background_color}

spi:
  clk_pin: 7
  mosi_pin: 6

display:
  - platform: ili9xxx
    id: s3_box_lcd
    model: S3BOX
    data_rate: 40MHz
    cs_pin: 5
    dc_pin: 4
    reset_pin:
      number: 48
      inverted: true
    update_interval: never
    pages:
      - id: idle_page
        lambda: |-
          it.fill(id(idle_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_idle), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            // it.filled_rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            // it.rectangle(20 , 20 , 280 , 30 , Color::WHITE );
            // it.printf(30, 25, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());
          }
      - id: listening_page
        lambda: |-
          it.fill(id(listening_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_listening), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            // it.filled_rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            // it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            it.printf(30, 25, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());
          }
      - id: thinking_page
        lambda: |-
          it.fill(id(thinking_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_thinking), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            // it.filled_rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            // it.rectangle(20 , 20 , 280 , 30 , Color::BLACK );
            it.printf(30, 25, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());
          }
      - id: replying_page
        lambda: |-
          it.printf(${request_text_start_x}, ${request_text_start_y}, id(font_request), Color::WHITE, "%s", id(text_request).state.c_str());

          int x = ${response_text_start_x};
          int y = ${response_text_start_y};
          int line_height = ${response_text_line_height};
          int max_lines = ${response_text_max_lines};
          int max_line_length = ${response_text_max_line_length};
          int line_count = 0;
          int line_length = 0;

          it.fill(id(replying_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_replying), ImageAlign::CENTER);
          if (id(display_conversation).state) {
            it.printf(x, y, id(font_response), Color::WHITE, "%s", id(text_response).state.c_str());

            // split the response into lines
            std::string response = id(text_response).state.c_str();
            std::string line;

            for (int i = 0; i < response.length(); i++) {

              if (response[i] == ' ' || response[i] == '\n') {
                if (line_length + 1 > max_line_length) {
                  // draw a black box behind the text at the current y position
                  it.filled_rectangle(x, y, it.get_width() - x, line_height, Color::BLACK);
                  it.printf(x, y, id(font_response), Color::WHITE, "%s", line.c_str());
                  line = "";
                  y += line_height;
                  line_count++;
                  line_length = 0;
                  if (line_count >= max_lines) {
                    break;
                  }
                } else {
                  line += response[i];
                  line_length++;
                }
              } else {
                line += response[i];
                line_length++;
              }
            }
          }
      - id: error_page
        lambda: |-
          it.fill(id(error_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_error), ImageAlign::CENTER);
      - id: no_ha_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_ha), ImageAlign::CENTER);
          // it.printf(10, 10, id(font_request), Color::WHITE, "IP: %s", id(wifi_info).state.c_str());

      - id: no_wifi_page
        lambda: |-
          it.image((it.get_width() / 2), (it.get_height() / 2), id(error_no_wifi), ImageAlign::CENTER);
      - id: initializing_page
        lambda: |-
          it.fill(id(loading_color));
          it.image((it.get_width() / 2), (it.get_height() / 2), id(ollama_initializing), ImageAlign::CENTER);
      - id: muted_page
        lambda: |-
          it.fill(Color::BLACK);
          // if (id(display_conversation).state) {
          //   it.printf(10, 10, id(font_request), Color::WHITE, "Muted");
          //   it.printf(10, 30, id(font_request), Color::WHITE, "Press the button to talk");
          // }

This worked great for me but i would have liked it to be even louder. But perhaps too much to ask from a device this small. Also i wish the mic would be a bit more sensitive, very common that it does not pickup on the wake word "hey jarvis".