esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
294 stars 37 forks source link

New version of Esphome break OTA #2784

Open blademckain opened 2 years ago

blademckain commented 2 years ago

The problem

I use Esphome as a Home Assistant addon. My cards are ESP32 (wrover with 16MB). The version previously in use is 1.17.0-dev Feb 19 2021. And everything worked fine. I was able to send the new firmware via OTA. I have now installed version 2022.1.0-dev Dec 5 2021. I can send the first firmware, but the next I get a "[Errno 104] Connection reset by peer". If, on the other hand, I send the .bin file through the "upload" function of the web server, it works.

I am attaching the simplest ESP board configuration, but it also happens in all other boards (11) which have different configurations (without ble_tracker)

Which version of ESPHome has the issue?

2022.1.0-dev Dec 5 2021

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

latest

What platform are you using?

ESP32

Board

esp32 wrover 16mb

Component causing the issue

OTA

Example YAML snippet

substitutions:
  device: f_blegw
  name: "BLE Gateway"
  static_ip: 192.168.12.30
  reboot_timeout: 1d
  update_interval: 10min

esphome:
  name: ${device}
  platform: ESP32
  board: esp-wrover-kit

wifi:
  ssid: !secret wifi_ssid_name
  password: !secret wifi_ssid_password
  power_save_mode: LIGHT 
  reboot_timeout: ${reboot_timeout}
  manual_ip:
    static_ip: ${static_ip}
    gateway: 192.168.12.254
    subnet: 255.255.255.0
    dns1: 8.8.8.8
    dns2: 8.8.4.4
  ap:
    ssid: ${device}
    password: !secret wifi_ssid_password
captive_portal:

ota:
  safe_mode: True

web_server:
  port: 80

time:
  - platform: homeassistant
    timezone: Europe/Rome
    id: homeassistant_time

text_sensor:
  - platform: version
    name: ${device} ESPHome Version
  - platform: wifi_info
    ip_address:
      name: ${device} IP Address
    ssid:
      name: ${device} SSID
    bssid:
      name: ${device} BSSID

api:
  reboot_timeout: ${reboot_timeout}
  password: !secret esphome_api_password

binary_sensor:
  - platform: ble_presence
    device_class: presence
    mac_address: CF:76:80:A3:39:4B
    name: "${device} Antonio Beacon"
  - platform: ble_presence
    device_class: presence
    mac_address: F1:43:27:AB:32:A6
    name: "${device} Raffaella Beacon"
  - platform: ble_presence
    device_class: presence
    mac_address: DC:0A:A2:AF:18:43
    name: "${device} Ginevra Beacon"

esp32_ble_tracker:
  scan_parameters:
    duration: 300s
    window: 48ms
    interval: 64ms
    active: False

Anything in the logs that might be useful for us?

No response

Additional information

No response

blademckain commented 2 years ago

I tried with ANOTHER esp32 card and simpler setup. Same mistake

substitutions:
  device: f-test4
  name: "F Test 4"
  static_ip: 192.168.12.9
  reboot_timeout: 1d
  update_interval: 10min

esphome:
  name: ${device}
  platform: ESP32
  board: esp-wrover-kit

wifi:
  ssid: !secret wifi_ssid_name
  password: !secret wifi_ssid_password
  power_save_mode: none
  reboot_timeout: ${reboot_timeout}
  manual_ip:
    static_ip: ${static_ip}
    gateway: 192.168.12.254
    subnet: 255.255.255.0
    dns1: 8.8.8.8
    dns2: 8.8.4.4
  ap:
    ssid: ${device}
    password: !secret wifi_ssid_password
captive_portal:

ota:
  safe_mode: True

logger:
  level: DEBUG

web_server:
  port: 80

time:
  - platform: homeassistant
    timezone: Europe/Rome
    id: homeassistant_time
INFO Reading configuration /config/esphome/f-test4.yaml...
INFO Generating C++ source...
INFO Compiling app...
Processing f-test4 (board: esp-wrover-kit; framework: arduino; platform: platformio/espressif32 @ 3.3.2)
--------------------------------------------------------------------------------
HARDWARE: ESP32 240MHz, 320KB RAM, 4MB Flash
Dependency Graph
|-- <AsyncTCP-esphome> 1.2.2
|-- <WiFi> 1.0
|-- <FS> 1.0
|-- <Update> 1.0
|-- <ESPAsyncWebServer-esphome> 2.1.0
|   |-- <AsyncTCP-esphome> 1.2.2
|-- <DNSServer> 1.1.0
|-- <ArduinoJson-esphomelib> 5.13.3
|-- <ESPmDNS> 1.0
RAM:   [=         ]  12.6% (used 41244 bytes from 327680 bytes)
Flash: [=====     ]  49.2% (used 902614 bytes from 1835008 bytes)
========================= [SUCCESS] Took 5.37 seconds =========================
INFO Successfully compiled program.
INFO Connecting to 192.168.12.9
INFO Uploading /data/f-test4/.pioenvs/f-test4/firmware.bin (902720 bytes)
Uploading: [====================                                        ] 34% 
ERROR Error sending data: [Errno 104] Connection reset by peer

Log from serial

[10:11:59][D][ota:136]: Starting OTA Update from 192.168.12.244...
[10:12:00][D][ota:304]: OTA in progress: 0.1%
[10:12:01][D][ota:304]: OTA in progress: 6.9%
[10:12:02][D][ota:304]: OTA in progress: 13.6%
[10:12:03][D][ota:304]: OTA in progress: 20.6%
[10:12:04][D][ota:304]: OTA in progress: 27.5%
[10:12:04]E (200399) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
[10:12:04]E (200436) task_wdt:  - loopTask (CPU 1)
[10:12:04]E (200436) task_wdt: Tasks currently running:
[10:12:04]E (200436) task_wdt: CPU 0: ipc0
[10:12:05]E (200436) task_wdt: CPU 1: IDLE1
[10:12:05]E (200436) task_wdt: Aborting.
[10:12:05]abort() was called at PC 0x4014c624 on core 0
[10:12:05]
[10:12:05]ELF file SHA256: 0000000000000000
[10:12:05]
[10:12:05]Backtrace: 0x40088ac8:0x3ffbf810 0x40088d45:0x3ffbf830 0x4014c624:0x3ffbf850 0x40087205:0x3ffbf870 0x40084ec1:0x3ffb9d60 0x400879a5:0x3ffb9d80 0x40084c3f:0x3ffb9da0 0x40089d56:0x3ffb9dc0
[10:12:05]
[10:12:05]Rebooting...
[10:12:05]ets Jun  8 2016 00:22:57
[10:12:05]
[10:12:05]rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[10:12:05]configsip: 0, SPIWP:0xee
[10:12:05]clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[10:12:05]mode:DIO, clock div:2
[10:12:05]load:0x3fff0018,len:4
[10:12:05]load:0x3fff001c,len:1044
[10:12:05]load:0x40078000,len:8896
[10:12:05]load:0x40080400,len:5828
[10:12:05]entry 0x400806ac
[10:12:05][E][esp32-hal-misc.c:111] disableCore1WDT(): Failed to remove Core 1 IDLE task from WDT
[10:12:05][I][logger:214]: Log initialized
[10:12:05][C][ota:452]: There have been 2 suspected unsuccessful boot attempts.
[10:12:05][D][esp32.preferences:114]: Saving preferences to flash...
[10:12:05][I][app:029]: Running through setup()...
[10:12:05][C][wifi:037]: Setting up WiFi...
....
Maelstrom96 commented 2 years ago

Are you sure you're running the latest dev? This exact issue should be fixed with https://github.com/esphome/esphome/pull/2852. I ran into the same issue before I pulled that change.

probot-esphome[bot] commented 2 years ago

Hey there @esphome/core, mind taking a look at this issue as it has been labeled with an integration (ota) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)

blademckain commented 2 years ago

Are you sure you're running the latest dev? This exact issue should be fixed with esphome/esphome#2852. I ran into the same issue before I pulled that change.

Where can I find this information exactly? the esphome version sensor tells me "2022.1.0-dev Dec 5 2021, 08:48:52"

oxan commented 2 years ago

Where can I find this information exactly?

Apparently that's just about impossible to find without going to the shell :( Try updating and rebuilding the HA addon, and reflashing your devices twice.

Lyr3x commented 2 years ago

I just updated ESPHome to the latest stable and facing the exact same issue with my custom component. My first thought was that it has something to do with the changed behavior of delay() so added more yield() calls. It happens only if a certain function is called. That function extends the runtime of setup() by a few seconds. Currently, i have no clue whats changed and how to fix it.

[11:14:25]E (10300) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
[11:14:25]E (10300) task_wdt:  - loopTask (CPU 1)
[11:14:25]E (10300) task_wdt: Tasks currently running:
[11:14:25]E (10300) task_wdt: CPU 0: IDLE0
[11:14:25]E (10300) task_wdt: CPU 1: IDLE1
[11:14:25]E (10300) task_wdt: Aborting.
[11:14:25]abort() was called at PC 0x40157e9c on core 0
[11:14:25]
[11:14:25]ELF file SHA256: 0000000000000000
[11:14:25]
[11:14:25]Backtrace: 0x4008938c:0x3ffbf8d0 0x40089609:0x3ffbf8f0 0x40157e9c:0x3ffbf910 0x40087a2d:0x3ffbf930 0x4016f8af:0x3ffbc160 0x401597a7:0x3ffbc180 0x4008bdd9:0x3ffbc1a0 0x4008a61a:0x3ffbc1c0
WARNING Found stack trace! Trying to decode it
WARNING Decoded 0x4008938c: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
WARNING Decoded 0x40089609: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
WARNING Decoded 0x40157e9c: task_wdt_isr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/task_wdt.c:252
WARNING Decoded 0x40087a2d: _xt_lowint1 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/xtensa_vectors.S:1154
WARNING Decoded 0x4016f8af: esp_pm_impl_waiti at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/pm_esp32.c:492
WARNING Decoded 0x401597a7: esp_vApplicationIdleHook at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/freertos_hooks.c:108
WARNING Decoded 0x4008bdd9: prvIdleTask at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:3507
WARNING Decoded 0x4008a61a: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
[11:14:25]
[11:14:25]Rebooting...
[11:14:25]ets Jun  8 2016 00:22:57
[11:14:25]
[11:14:25]rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[11:14:25]configsip: 0, SPIWP:0xee
[11:14:25]clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[11:14:25]mode:DIO, clock div:2
[11:14:25]load:0x3fff0018,len:4
[11:14:25]load:0x3fff001c,len:1044
[11:14:25]load:0x40078000,len:10124
[11:14:25]load:0x40080400,len:5828
[11:14:25]entry 0x400806a8
[11:14:25][E][esp32-hal-misc.c:111] disableCore1WDT(): Failed to remove Core 1 IDLE task from WDT

I need to add App.feed-wdt() in the process now to make it work. Did the timing for the watchdog itself changed? I do not have any delay() calls.

mmakaay commented 2 years ago

Might I suggest upgrading to 2021.12.0 that has been released ? That way, no guessing is needed if the dev version does or does not have the fix. The issue as described sounds a lot like the one that was fixed. If 2021.12.0 fixes it, then the conclusion is that the dev version was just a bit too old.

oxan commented 2 years ago

Did the timing for the watchdog itself changed?

The watchdog is now watching the ESPHome task itself, instead of the idle task, so yield() no longer implicitly feeds the watchdog. If your setup() or loop() blocks for more than 5 seconds, you have to feed it (but you shouldn't block that long).

alpdmrel commented 2 years ago

I have the same problem. Gives 104 at random %. But sometimes it flashes the same thing it didn't flash 2 tries ago. Tried dev. Tried rebuilding. Tried 2021.12.0 and 2021.12.1 . At first it didn't flash with light component now it doesn't matter. Add or remove any component. Or without any component always got 104.

oxan commented 2 years ago

I have the same problem.

Which problem? There's already two different problems (with similar but not identical symptoms) being discussed in this thread, and yours might be the third ;)

Gives 104 at random %

What do you mean with "gives 104"?

mmakaay commented 2 years ago

That would be the error that is visible on the client side of the OTA update. One starts uploading new firmware, and before it reaches the end, the uploads stops and the 104 error is shown as the reason (IIRC that is a connection reset by peer).

autox86 commented 2 years ago

Hi all,

same issue here! Connection reset for OTA, but also disconnects from the API.

Disable the webserver! @blademckain You will see, it will work..

Some funny fact, web server caused issues for me already in 2017. I disabled it and fine. Now, somehow forgot about it, enabled again, and again issues...

BTW: @blademckain Can you do me a pleasure and implement:

switch:
  - platform: safe_mode
    name: "Living Room Restart (Safe Mode)"

Please click on the switch and report if you were able to connect to the esp. I receive:

WARNING Can't connect to ESPHome API for p2gosund.iot.<MYDOMAIN>: Error connecting to ('192.168.10.11', 6053): [Errno 111] Connect call failed ('192.168.10.11', 6053)
comnam90 commented 2 years ago

I'm having the same issues sadly. Initially working fine, now it's 50/50 if it'll work. I'm using an M5Stack Atom Lite with Web Server disable but esp32_ble_tracker is enabled.

CarlosGS commented 2 years ago

Could you try to replicate while the module is connected to a usb-serial logger? That way you should be able to see more details on the crash.

AshtonKem commented 2 years ago

Ditto with an ESP32 WROVER board. I am also using esp32_ble_tracker.

softplus commented 2 years ago

I found a way to fix this: disable bluetooth when an OTA starts (it gets re-enabled on reboot after OTA).

I don't have the BLE/OTA/Wifi bug on all of my boards, but I include this fix in my yaml files where I use esp32_ble_tracker:

esphome:
  includes:
    - components/just_esp_bt.h # contains: #include <esp_bt.h>

ota:
  on_begin:
    then:
      - logger.log: "OTA: disabling bluetooth ..."
      - lambda: '(void)esp_bt_controller_disable();'
      - lambda: '(void)esp_bt_controller_deinit();'
  on_error:
    then:
      - logger.log: "OTA error: Waiting 180s before forced reboot"
      - delay: 180 sec
      - logger.log: "OTA error: Rebooting now"
      - lambda: 'App.safe_reboot();'

The components/just_esp_bt.h file just contains the line:

#include <esp_bt.h>

I don't know if there's a way to trigger an OTA without permission, so I included a 180s delay before a forced reboot on failed OTAs. This just makes sure that bluetooth goes back on (at reboot) should something with an OTA break. Maybe the time should be larger? YMMV.

mmakaay commented 2 years ago

OTA can only be triggered when starting an upgrade with the correct password. I wouldn't bother with a long timeout, but would rather get the device back in an operational state quickly with the reboot.

Doesn't it work when you do the include from lambda?

ota:
  on_begin:
    then:
      - logger.log: "OTA: disabling bluetooth ..."
      - lambda: |-
            #include <esp_bt.h>
            esp_bt_controller_disable();
            esp_bt_controller_deinit();
foorschtbar commented 2 years ago

Same here. Worked since 2 years with untouched config and now no OTA update possible since last update. Update via Webserver doenst work, only flashing via Serial Port :(

accelle17 commented 2 years ago

I found a way to fix this: disable bluetooth when an OTA starts (it gets re-enabled on reboot after OTA).

I don't have the BLE/OTA/Wifi bug on all of my boards, but I include this fix in my yaml files where I use esp32_ble_tracker:

esphome:
  includes:
    - components/just_esp_bt.h # contains: #include <esp_bt.h>

ota:
  on_begin:
    then:
      - logger.log: "OTA: disabling bluetooth ..."
      - lambda: '(void)esp_bt_controller_disable();'
      - lambda: '(void)esp_bt_controller_deinit();'
  on_error:
    then:
      - logger.log: "OTA error: Waiting 180s before forced reboot"
      - delay: 180 sec
      - logger.log: "OTA error: Rebooting now"
      - lambda: 'App.safe_reboot();'

The components/just_esp_bt.h file just contains the line:

#include <esp_bt.h>

Thanks for this. It actually let me do OTA with ble_tracker enabled.

dhover commented 2 years ago

I have the same problem. Updating two M5Stack Atom lite's with only the bluetooth proxy config produces an error: ERROR Error sending data: [Errno 104] Connection reset by peer

INFO Reading configuration /config/esphome/atom-lite-01.yaml...
INFO Generating C++ source...
INFO Compiling app...
Processing atom-lite-01 (board: m5stack-atom; framework: arduino; platform: platformio/espressif32 @ 3.5.0)
--------------------------------------------------------------------------------
HARDWARE: ESP32 240MHz, 320KB RAM, 4MB Flash
LDF: Library Dependency Finder -> https://bit.ly/configure-pio-ldf
Dependency Graph
|-- AsyncTCP-esphome @ 1.2.2
|-- WiFi @ 1.0
|-- FS @ 1.0
|-- Update @ 1.0
|-- ESPAsyncWebServer-esphome @ 2.1.0
|   |-- AsyncTCP-esphome @ 1.2.2
|-- DNSServer @ 1.1.0
|-- ESPmDNS @ 1.0
|-- noise-c @ 0.1.4
|   |-- libsodium @ 1.10018.1
|-- Wire @ 1.0.1
|-- Improv @ 1.2.1
|-- FastLED @ 3.3.2
RAM:   [==        ]  16.9% (used 55436 bytes from 327680 bytes)
Flash: [========= ]  91.9% (used 1686974 bytes from 1835008 bytes)
========================= [SUCCESS] Took 3.30 seconds =========================
INFO Successfully compiled program.
INFO Connecting to 192.168.x.x
INFO Uploading /data/atom-lite-01/.pioenvs/atom-lite-01/firmware.bin (1687088 bytes)
Uploading: [==============                                              ] 23% 
ERROR Error sending data: [Errno 104] Connection reset by peer

Sometimes the OTA works but most of the time it doesn't.