esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 34 forks source link

Unsolicited reboots of ESP32 QUINLED on HA #1313

Closed Lefuneste83 closed 3 years ago

Lefuneste83 commented 4 years ago

Operating environment/Installation (Hass.io/Docker/pip/etc.): Home Assitant on Docker Home Assistant 0.111.4

Issue occurs with compiled firmware against Dev and 1.14.5. 1.14.4 does not allow the ESP32 to reboot properly due to another bug with one of the sensors.

ESP (ESP32/ESP8266, Board/Sonoff): Esp32 mhetesp32devkit

Affected component: wifi output light

Description of problem: Unsolicited reboots of the ESP32 : -At random moments -In particular when controlling several GPIO on the same ESP32 through a light group entity (in configuration.yaml) for brightness.

light:

The ESP32 is coupled with a QUINLED QUAD which has been running flawlessly for more than a year. The issue occurs whatever the hardware used (tried with several ESP32/QUINLED Boards). The ESP32 seems to disconnect or be disconnected then reconnects rapidly. This reboot problem appeared a few days ago and now that I am investigating it seems obvious that there is a regression in one of the components. It may be related to HA less likely to the Wifi.

Problem-relevant YAML-configuration entries:

esphome:
  name: led_cave_01
  platform: ESP32
  board: mhetesp32devkit

wifi:
  ssid: "SSID
  password: "mypassword"
  fast_connect: on
  manual_ip:
    static_ip: 192.168.X.X
    gateway: 192.168.X.X
    subnet: 255.255.255.X
    dns1: 192.168.X.X
    dns2: 8.8.8.8

mqtt:
  broker: "192.168.X.X"
  username: "mylogin"
  password: "mysecret
  birth_message:
  will_message:

# Enable logging
logger:

ota:
  password: "otapassword"

dallas:
  - pin: GPIO18

switch:
  - platform: gpio
    name: "Led Cave 01 Onboard Light"
    pin: 2

output:
  - platform: ledc
    pin: 16
    frequency: "40000Hz"
    id: LED_gpio_16
  - platform: ledc
    pin: 17
    frequency: "40000Hz"
    id: LED_gpio_17
  - platform: ledc
    pin: 5
    frequency: "40000Hz"
    id: LED_gpio_5
  - platform: ledc
    pin: 19
    frequency: "40000Hz"
    id: LED_gpio_19

light:
  - platform: cwww
    name: "Led Cave 01 A"
    warm_white: LED_gpio_16
    cold_white: LED_gpio_17
    warm_white_color_temperature: 3000 K
    cold_white_color_temperature: 6000 K
    default_transition_length: 1s

  - platform: cwww
    name: "Led Cave 01 B"
    warm_white: LED_gpio_5
    cold_white: LED_gpio_19
    warm_white_color_temperature: 3000 K
    cold_white_color_temperature: 6000 K
    default_transition_length: 1s

sensor:
  - platform: wifi_signal
    name: "Led Cave 01 Wifi Signal Sensor"
    update_interval: 60s

  - platform: dallas
    index: 0
    name: "Led Cave 01 Temperature Sensor"

**Logs (if applicable):**
No significant info in the logs. THe ESP32 just reboot, flickers the light once, then goes back to logging 
Increasing loglevel to VERBOSE does not bring further information.

INFO Reading configuration /config/esphome/led_cave_01.yaml...
INFO Starting log output from led_cave_01/debug
INFO Connected to MQTT broker!

PASTE DEBUG LOG HERE



**Additional information and things you've tried:**
The issue seems mitigated when directly controlling entities without using light group configuration from HA. The reboot still occurs but randomly. Controlling through light group triggers the reboot about 20% of the time.
I will try to pinpoint the issue further in the coming days.

<!-- LEAVE THIS LINE AS-IS AND DON'T DELETE IT, OTHERWISE THE ISSUE WILL BE CLOSED AUTOMATICALLY. -->
Lefuneste83 commented 4 years ago

Upon further investigations and tests I can now safely say that the reboots I am witnessing are related to a particular way the MQTT messages are being sent by HA to the device.

It occurs specifically when HA addresses 2 different entities exposed by the same device (QUINLED QUAD) using a group component. Either as light group or as standard group, the behavior is similar. When using this aggregation layer from within HA, the device when receiving 2 different MQTT messages at the same time for 2 entities, it fails to address the command and for some reason : -It reboots, (thus disconnects from the MQTT Broker then reconnects immediately) -I could not trace a dump or crash log -Addressing 1 single entity (of the 2 exposed by the device) at a time this phenomenon does not occur.

There could be some sort of timing issue in conjonction with the "group" mode of command emitted by HA. The "group" command seems to have many issues in HA. There seem to be numerous bugs related to it. It looks like if the device fails to address at the same time all 4 GPIOs declared in my FW for the LED rails.

Though I think there is need for investigation as to why the device reboots in such a case. This use case is not particularly exotic and many users could face a similar situation.

It does not seem to be related to any particular version of ESPHome as I have compiled the FW on all available versions and it behaves the same.

glmnet commented 4 years ago

It will be great if you could setup the device with api: just to confirm this issue is MQTT / HA specific.

Rogi66 commented 4 years ago

Well, I had the same problems, nothing to do with MQTT, it was on all esphome nodes. What I did, save all yaml files from the esphome devices. Delete the integrations from esphome. Update HA to version 0.112.0, update esphome to Current version: 1.14.5. add all the epshome nodes, back again, compile/upload again and add them to lovelace. I used :

esphome: name: Curtain platform: ESP8266 board: d1_mini arduino_version: 2.4.2 wifi: ssid: !secret esp_ssid password: !secret esp_ssidpass manual_ip: static_ip: 192.x.x.x gateway: 192.x.x.x subnet: 255.255.255.0 dns1: 192.x.x.x dns2: 1.1.1.1

Enable fallback hotspot (captive portal) in case wifi connection fails ap: ssid: "FB Curtain" password: !secret esp_fbssidpass

captive_portal:

Enable logging logger:

Enable Home Assistant API

ota: password: !secret esp_otapass

web_server: port: 80

api: password: !secret esp_apipass

The result is: No more disconnections! ;)

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.