esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 34 forks source link

Crashes when mqtt messages are received fast #3341

Open dave opened 2 years ago

dave commented 2 years ago

The problem

I have an ESP32 (M5 Stack Atom Lite). The status LED is toggled by MQTT messages. When I send MQTT messages slowly, the system reacts as you would expect - the light toggles. However when many messages are sent quickly, the ESP32 crashes and reboots.

I'm not using Home Assistant - just esphome and an MQTT server. This is my very first iot project, so I have no idea how to start debugging this. I've tried:

Any pointers?

My esphome config file is attached.

Which version of ESPHome has the issue?

Version: 2022.5.1

What type of installation are you using?

pip

Which version of Home Assistant has the issue?

n/a

What platform are you using?

ESP32

Board

M5Stack Atom Lite

Component causing the issue

Not sure

Example YAML snippet

esphome:
  name: atom1

esp32:
  board: m5stack-atom
  framework:
    type: arduino

light:
  - name: Status Light
    platform: fastled_clockless
    chipset: SK6812
    pin: 27
    num_leds: 1
    rgb_order: GRB
    id: status_led

mqtt:
  broker: pi1.lan
  on_message:
    topic: message_topic
    then:
      - light.toggle:
          id: status_led
          transition_length: 0s

logger:

ota:
  password: ""

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

captive_portal:

Anything in the logs that might be useful for us?

No response

Additional information

No response

dave commented 2 years ago

I connected with the screen serial tool, and this was the error:

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400e1c31  PS      : 0x00060230  A0      : 0x800e1c7d  A1      : 0x3ffd2de0
A2      : 0x00000001  A3      : 0x3ffb92a8  A4      : 0x3ffd2ec4  A5      : 0x00000000
A6      : 0x00000000  A7      : 0x3ffd37c8  A8      : 0x00000001  A9      : 0x3ffd37cc
A10     : 0x00000000  A11     : 0x3ffd2ec4  A12     : 0x00000001  A13     : 0x0000ce80
A14     : 0x7300646e  A15     : 0x7300646e  SAR     : 0x0000000a  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x4000c349  LEND    : 0x4000c36b  LCOUNT  : 0xffffffff

ELF file SHA256: 0000000000000000

Backtrace: 0x400e1c31:0x3ffd2de0 0x400e1c7a:0x3ffd2e10 0x400e1dd6:0x3ffd2e30 0x400e15ae:0x3ffd2e60 0x400db729:0x3ffd2ea0 0x400d6eed:0x3ffd2f00 0x400d6fa0:0x3ffd2f20 0x400d609a:0x3ffd2f40 0x400e3ddd:0x3ffd2fa0 0x400e2426:0x3ffd3110 0x400d949e:0x3ffd3170 0x400d955a:0x3ffd31e0 0x400d8d36:0x3ffd3220 0x400d8da5:0x3ffd3240 0x400d868a:0x3ffd3290 0x400ecb3d:0x3ffd32d0 0x4016bbb9:0x3ffd3340 0x400ee1de:0x3ffd3370 0x400ec73d:0x3ffd33c0 0x400ec7ce:0x3ffd3430 0x40163565:0x3ffd3450 0x401635e1:0x3ffd3490 0x40163c8e:0x3ffd34b0 0x40089b92:0x3ffd34e0

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 188777542, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5828
entry 0x400806a8
dave commented 2 years ago

I ran it a few more times, and sometimes get this error instead:

E (157679) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (157679) task_wdt:  - async_tcp (CPU 1)
E (157679) task_wdt: Tasks currently running:
E (157679) task_wdt: CPU 0: IDLE0
E (157679) task_wdt: CPU 1: IDLE1
E (157679) task_wdt: Aborting.
abort() was called at PC 0x40155b54 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40088904:0x3ffbf830 0x40088b81:0x3ffbf850 0x40155b54:0x3ffbf870 0x40087019:0x3ffbf890 0x4016ce3f:0x3ffbc160 0x4015745f:0x3ffbc180 0x4008b341:0x3ffbc1a0 0x40089b92:0x3ffbc1c0

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 188777542, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5828
entry 0x400806a8
WhiteSockedDancer commented 2 years ago

Ive got the same Problem. My ESP32 is restarting if it receives 2 messages at once.. I've got this error on all my ESP32s which control relays or leds.

WhiteSockedDancer commented 2 years ago

i think its the same issue than these:

https://github.com/esphome/issues/issues/3078 https://github.com/esphome/issues/issues/2844

Zebble commented 1 year ago

Having a similar issue here as well with a ttgo-ethernet-poe v1.0 board. Reboots when MQTT is used. Not particularly fast, just seems to reboot while in use.

dave commented 1 year ago

I think I've worked out the problem! I was sending an integer from 0-10000 as the value of the MQTT message, and dividing by 10000 to get the floating point value in an esphome lambda, like this:

brightness: !lambda "return atoi(x.c_str()) / 10000.0;"

I've changed it to use send a value 0-1024, and it works fine with no reboots:

brightness: !lambda "return atoi(x.c_str()) / 1024.0;"

Here's my full working config:

substitutions:
  device_name: dimmer1
  friendly_name: Dimmer 1

esphome:
  name: ${device_name}

esp32:
  board: m5stack-atom
  framework:
    type: arduino

output:
  - platform: ledc
    pin: GPIO22
    id: gpio_22
    frequency: 10kHz

light:
  - platform: monochromatic
    output: gpio_22
    name: ${friendly_name} Light
    id: dimmer

binary_sensor:
  - platform: gpio
    pin:
      number: 39
      inverted: true
    name: ${friendly_name} Button
    on_press:
      then:
        - light.toggle:
            id: dimmer
            transition_length: 0s
        - light.control:
            id: dimmer
            brightness: 1.0

mqtt:
  broker: pi1.lan
  on_message:
    topic: dimmer1/value
    then:
      - light.turn_on:
          id: dimmer
          brightness: !lambda "return atoi(x.c_str()) / 1024.0;"
          transition_length: 0s

logger:
  level: NONE

ota:
  password: ""

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "${friendly_name} Fallback Hotspot"
    password: !secret wifi_password

captive_portal:
dave commented 1 year ago

I'll leave this issue open because there's definitely a bug somewhere, but using 0-1024 as the value works fine for me.

WhiteSockedDancer commented 1 year ago

I'll leave this issue open because there's definitely a bug somewhere, but using 0-1024 as the value works fine for me.

Could there be some sort of buffer/memory overflow, because smaller messages work fine for you?

I went to put delays between my messages and tried to minimize commands at the same time.. but now everything is not that responding..

I would love to help with that problem, but i dont know how.