esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
292 stars 36 forks source link

Sprinkler Controller, running a cycle leads to panic and automatic reboot #5740

Open ychieux opened 6 months ago

ychieux commented 6 months ago

The problem

ESP32 crashes and reboots when running a cycle. I ran the same YAML on an ESP8266 based board and get the same crash/reboot while running an irrigation cycle.

[18:46:56]Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
[18:46:56]
[18:46:56]Core  1 register dump:
[18:46:56]PC      : 0x400e2f44  PS      : 0x00060530  A0      : 0x80205d00  A1      : 0x3ffcd9e0  
[18:46:56]A2      : 0x3ffbc8fc  A3      : 0x00000000  A4      : 0x3ffbca98  A5      : 0x000035c0  
[18:46:56]A6      : 0x3ffcd9c4  A7      : 0x00000000  A8      : 0x800e2f36  A9      : 0x3ffcd9c0  
[18:46:56]A10     : 0x3ffbca68  A11     : 0x00000000  A12     : 0x00002710  A13     : 0x00060f23  
[18:46:56]A14     : 0x00060f20  A15     : 0x00000001  SAR     : 0x0000000a  EXCCAUSE: 0x0000001c  
[18:46:56]EXCVADDR: 0x00000028  LBEG    : 0x40084605  LEND    : 0x4008460d  LCOUNT  : 0x00000027  
[18:46:56]
[18:46:56]
[18:46:56]Backtrace:0x400e2f41:0x3ffcd9e00x40205cfd:0x3ffcda00 0x40205d61:0x3ffcda20 0x400ea350:0x3ffcda40 0x400ec942:0x3ffcda70 0x400f9115:0x3ffcda90 
[18:46:56]
[18:46:56]
[18:46:56]
[18:46:56]
[18:46:56]ELF file SHA256: 0000000000000000
[18:46:56]
[18:46:56]Rebooting...
[18:46:56]ets Jun  8 2016 00:22:57
[18:46:56]
[18:46:56]rst:0xc (SW_CPU_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[18:46:56]configsip: 0, SPIWP:0xee
[18:46:56]clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[18:46:56]mode:DIO, clock div:2
[18:46:56]load:0x3fff0030,len:1184
[18:46:56]load:0x40078000,len:13132
[18:46:56]load:0x40080400,len:3036
[18:46:56]entry 0x400805e4

Which version of ESPHome has the issue?

2024.4.1

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2024.4.4

What platform are you using?

ESP32

Board

LILYGO T-Relay ESP32 (4ch with ESP32-WROOM-B)

Component causing the issue

Sprinkler Controller

Example YAML snippet

sprinkler:
  - id: lawn_sprinkler_ctrlr
    main_switch: "Start Irrigation Cycle"
    auto_advance_switch: "Irrigation Auto Advance"
    reverse_switch: "Irrigation Reverse"
    multiplier_number: "Irrigation Multiplier"
    repeat_number: "Irrigation Repeat"
    valve_overlap: 5s
    valves:
      - valve_switch: "Flowers Front"
        enable_switch:
          id: enable_switch_valve2
          name: "Enable irrigation for Flowers Front"
          restore_mode: RESTORE_DEFAULT_ON
        run_duration: 10s
        valve_switch_id: irrigation_valve_sw2

      - valve_switch: "Trees"
        enable_switch:
          id: enable_switch_valve3
          name: "Enable irrigation for Trees"
          restore_mode: RESTORE_DEFAULT_ON
        run_duration: 10s
        valve_switch_id: irrigation_valve_sw3

      - valve_switch: "Flowers Back"
        enable_switch:
          id: enable_switch_valve4
          name: "Enable irrigation for Flowers Back"
          restore_mode: RESTORE_DEFAULT_ON
        run_duration: 10s
        valve_switch_id: irrigation_valve_sw4

      # Relay/Valve #1 for Lawn
      - valve_switch: "Lawn"
        enable_switch:
          id: enable_switch_valve1
          name: "Enable irrigation for Lawn"
          restore_mode: RESTORE_DEFAULT_OFF
        run_duration: 10s
        valve_switch_id: irrigation_valve_sw1

button:
  - platform: restart
    id: irrigation_controller_restart_button
    name: "Irrigation Controller Restart Button"

switch:
  - platform: gpio
    pin: 21
    # Garden Irrigation - Valve 1
    id: irrigation_valve_sw1

  - platform: gpio
    pin: 19
    #Garden Irrigation - Valve 2
    id: irrigation_valve_sw2

  - platform: gpio
    pin: 18
    #Garden Irrigation - Valve 3
    id: irrigation_valve_sw3

  - platform: gpio
    pin: 5
    #Garden Irrigation - Valve 4
    id: irrigation_valve_sw4

Anything in the logs that might be useful for us?

[18:46:41][D][switch:012]: 'Start Irrigation Cycle' Turning ON.
[18:46:41][VV][sprinkler:1488]: fsm_transition_ called; state is IDLE
[18:46:41][VV][scheduler:032]: set_timeout(name='Start Irrigation Cyclesm', timeout=5000)
[18:46:41][VV][sprinkler:1651]: Timer 0 started for 5 sec
[18:46:41][D][sprinkler:1427]: CYCLE is starting valve 0 for 10 seconds, cycle 1 of 1
[18:46:41][D][switch:012]: 'irrigation_valve_sw2' Turning ON.
[18:46:41][D][switch:055]: 'irrigation_valve_sw2': Sending state ON
[18:46:41][VV][sprinkler:1531]: fsm_transition_ complete; new state is ACTIVE
[18:46:41][D][switch:055]: 'Start Irrigation Cycle': Sending state ON
[18:46:41][V][json:038]: Attempting to allocate 512 bytes for JSON serialization
[18:46:41][V][json:058]: Size after shrink 80 bytes
[18:46:41][VV][api.service:156]: send_switch_state_response: SwitchStateResponse {
  key: 2268764749
  state: YES
}
[18:46:41][W][component:237]: Component web_server took a long time for an operation (80 ms).
[18:46:41][W][component:238]: Components should block for at most 30 ms.
[18:46:41][D][switch:055]: 'Flowers Front': Sending state ON
[18:46:41][V][json:038]: Attempting to allocate 512 bytes for JSON serialization
[18:46:41][V][json:058]: Size after shrink 72 bytes
[18:46:41][VV][api.service:156]: send_switch_state_response: SwitchStateResponse {
  key: 3963119055
  state: YES
}
[18:46:42][VV][scheduler:226]: Running interval '' with interval=10000 last_execution=16357 (now=26358)
[18:46:46][VV][scheduler:226]: Running timeout 'Start Irrigation Cyclesm' with interval=5000 last_execution=25734 (now=30740)
[18:46:46][VV][sprinkler:1682]: State machine timer expired
[18:46:46][VV][sprinkler:1488]: fsm_transition_ called; state is ACTIVE
[18:46:46][D][sprinkler:1296]: Marking valve 0 complete
[18:46:46][VV][scheduler:032]: set_timeout(name='Start Irrigation Cyclesm', timeout=5000)
[18:46:46][VV][sprinkler:1651]: Timer 0 started for 5 sec
[18:46:46][D][sprinkler:1427]: CYCLE is starting valve 1 for 10 seconds, cycle 1 of 1
[18:46:46][D][switch:012]: 'irrigation_valve_sw3' Turning ON.
[18:46:46][D][switch:055]: 'irrigation_valve_sw3': Sending state ON
[18:46:46][VV][sprinkler:1531]: fsm_transition_ complete; new state is ACTIVE
[18:46:46][W][component:237]: Component sprinkler took a long time for an operation (63 ms).
[18:46:46][W][component:238]: Components should block for at most 30 ms.
[18:46:46][D][switch:055]: 'Trees': Sending state ON
[18:46:46][V][json:038]: Attempting to allocate 512 bytes for JSON serialization
[18:46:46][E][json:043]: Could not allocate memory for JSON document! Requested 512 bytes, largest free heap block: 1396 bytes
[18:46:46][VV][api.service:156]: send_switch_state_response: SwitchStateResponse {
  key: 3882419280
  state: YES
}
[18:46:48][VV][scheduler:226]: Running interval '' with interval=60000 last_execution=4294939677 (now=32382)
[18:46:51][VV][scheduler:226]: Running timeout 'Start Irrigation Cyclesm' with interval=5000 last_execution=30763 (now=35765)
[18:46:51][VV][sprinkler:1682]: State machine timer expired
[18:46:51][VV][sprinkler:1488]: fsm_transition_ called; state is ACTIVE
[18:46:51][D][sprinkler:1296]: Marking valve 1 complete
[18:46:51][VV][scheduler:032]: set_timeout(name='Start Irrigation Cyclesm', timeout=5000)
[18:46:51][VV][sprinkler:1651]: Timer 0 started for 5 sec
[18:46:51][VV][sprinkler:1531]: fsm_transition_ complete; new state is ACTIVE
[18:46:51][D][switch:016]: 'irrigation_valve_sw2' Turning OFF.
[18:46:51][D][switch:055]: 'irrigation_valve_sw2': Sending state OFF
[18:46:51][D][switch:055]: 'Flowers Front': Sending state OFF
[18:46:51][V][json:038]: Attempting to allocate 512 bytes for JSON serialization
[18:46:51][V][json:058]: Size after shrink 72 bytes
[18:46:51][VV][api.service:156]: send_switch_state_response: SwitchStateResponse {
  key: 3963119055
  state: NO
}
[18:46:52][VV][scheduler:226]: Running interval '' with interval=10000 last_execution=26357 (now=36358)
[18:46:56][VV][scheduler:226]: Running timeout 'Start Irrigation Cyclesm' with interval=5000 last_execution=35784 (now=40784)
[18:46:56][VV][sprinkler:1682]: State machine timer expired
[18:46:56][VV][sprinkler:1488]: fsm_transition_ called; state is ACTIVE
[18:46:56][D][sprinkler:1296]: Marking valve 2 complete
[18:46:56][VV][scheduler:032]: set_timeout(name='Start Irrigation Cyclesm', timeout=5000)
[18:46:56][VV][sprinkler:1651]: Timer 0 started for 5 sec
[18:46:56][D][sprinkler:1427]: CYCLE is starting valve 3 for 10 seconds, cycle 1 of 1
[18:46:56][D][switch:012]: 'irrigation_valve_sw1' Turning ON.
[18:46:56][D][switch:055]: 'irrigation_valve_sw1': Sending state ON
[18:46:56][VV][sprinkler:1531]: fsm_transition_ complete; new state is ACTIVE
[18:46:56][W][component:237]: Component sprinkler took a long time for an operation (62 ms).
[18:46:56][W][component:238]: Components should block for at most 30 ms.
[18:46:56][D][switch:055]: 'Lawn': Sending state ON
[18:46:56][V][json:038]: Attempting to allocate 512 bytes for JSON serialization
[18:46:56][V][json:058]: Size after shrink 60 bytes
[18:46:56][VV][api.service:156]: send_switch_state_response: SwitchStateResponse {
  key: 557079219
  state: YES
}
[18:46:56][D][switch:016]: 'irrigation_valve_sw3' Turning OFF.
[18:46:56][D][switch:055]: 'irrigation_valve_sw3': Sending state OFF
[18:46:56]Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
[18:46:56]
[18:46:56]Core  1 register dump:
[18:46:56]PC      : 0x400e2f44  PS      : 0x00060530  A0      : 0x80205d00  A1      : 0x3ffcd9e0  
[18:46:56]A2      : 0x3ffbc8fc  A3      : 0x00000000  A4      : 0x3ffbca98  A5      : 0x000035c0  
[18:46:56]A6      : 0x3ffcd9c4  A7      : 0x00000000  A8      : 0x800e2f36  A9      : 0x3ffcd9c0  
[18:46:56]A10     : 0x3ffbca68  A11     : 0x00000000  A12     : 0x00002710  A13     : 0x00060f23  
[18:46:56]A14     : 0x00060f20  A15     : 0x00000001  SAR     : 0x0000000a  EXCCAUSE: 0x0000001c  
[18:46:56]EXCVADDR: 0x00000028  LBEG    : 0x40084605  LEND    : 0x4008460d  LCOUNT  : 0x00000027  
[18:46:56]
[18:46:56]
[18:46:56]Backtrace:0x400e2f41:0x3ffcd9e00x40205cfd:0x3ffcda00 0x40205d61:0x3ffcda20 0x400ea350:0x3ffcda40 0x400ec942:0x3ffcda70 0x400f9115:0x3ffcda90 
[18:46:56]
[18:46:56]
[18:46:56]
[18:46:56]
[18:46:56]ELF file SHA256: 0000000000000000
[18:46:56]
[18:46:56]Rebooting...
[18:46:56]ets Jun  8 2016 00:22:57
[18:46:56]
[18:46:56]rst:0xc (SW_CPU_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
[18:46:56]configsip: 0, SPIWP:0xee
[18:46:56]clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[18:46:56]mode:DIO, clock div:2
[18:46:56]load:0x3fff0030,len:1184
[18:46:56]load:0x40078000,len:13132
[18:46:56]load:0x40080400,len:3036
[18:46:56]entry 0x400805e4

Additional information

No response

ychieux commented 6 months ago

I believe, I found the root cause... the valves' run duration was set to 10sec for testing purposes while the overlap was set to 5s i.e. exactly half the valve run duration. This led to a situation where timers failed when transitioning from valve no 2 to 3.

hetii commented 3 months ago

I'm not sure If I had the same root cause, but my board crash when I do monkey test and switch manually my zones fast on web server ui or by pressing the next valve button.

Sometimes controller do not switch off previous zone and enable the next one, then attempting to switch to the next zone, crash the system:

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x400e0d90  PS      : 0x00060f30  A0      : 0x8018c404  A1      : 0x3ffb2770  
A2      : 0x3ffb46e4  A3      : 0x00000000  A4      : 0x3ffb4880  A5      : 0x0000a540  
A6      : 0x3ffb2754  A7      : 0x0000000a  A8      : 0x800e0d82  A9      : 0x3ffb2750  
A10     : 0x3ffb4850  A11     : 0x00000000  A12     : 0x000493e0  A13     : 0x00060323  
A14     : 0x00060320  A15     : 0x00000001  SAR     : 0x0000000a  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000028  LBEG    : 0x400849ad  LEND    : 0x400849b5  LCOUNT  : 0x00000027  

Backtrace:0x400e0d8d:0x3ffb27700x4018c401:0x3ffb2790 0x4018c49d:0x3ffb27b0 0x400ebc28:0x3ffb27d0 0x400ef0d6:0x3ffb2800 0x401038e5:0x3ffb2820 

My initial sprinkler configuration is like:

sprinkler:
  - id: $devicename
    valve_open_delay: 1s
    next_prev_ignore_disabled: true
    pump_switch_off_during_valve_open_delay: false
    main_switch:
      id: main_switch
      name: "Start/Stop/Resume"
      icon: "mdi:play-pause"
      web_server_sorting_weight: 1
    standby_switch:
      id: standby_switch
      name: "Standby"
      icon: "mdi:play-pause"
      web_server_sorting_weight: 1
    auto_advance_switch:
      name: "Auto Advance"
      icon: "mdi:brightness-auto"
      web_server_sorting_weight: 2
    repeat_number:
      name: "Repeat Cycles"
      mode: box
      icon: "mdi:repeat"
      web_server_sorting_weight: 3
      initial_value: 0
      min_value: 0
      max_value: 10
      step: 1
      restore_value: true
      disabled_by_default: false
      entity_category: config
    multiplier_number:
      name: "Multiplier number"
      mode: box
      icon: "mdi:multiplication"
      web_server_sorting_weight: 5
      initial_value: 1
      min_value: 0.1
      max_value: 10
      step: 0.1
      unit_of_measurement: $uom
      restore_value: true
      disabled_by_default: false
      entity_category: config

    valves:
....

I don't use valve_overlap settings.

Moreover when I set valve_open_delay to 4s and play with switches it's even faster when It's crash. Also when I notice in web ui that two zones are on and switch one of them off and press next button then esp not crash but when run_duration time for that zone elapsed then it crash.

For me it looks like controller keeps some old reference about previous zone and try to switch it off when is already off and then crash on that.

So definitely something wrong with valve_open_delay and switches state.

BTW: I also notice that controller have issues when activate more then one zone at a time eg. Activate zone 1 switch on web ui, then before it change his state to on, activate zone 2, wait a bit and controller will activate both zone at the same time and this should not be allowed ;(

Edit: After discord support I put below more details of my crash for other viewers that can find this issue: Dump decoded by: esp-stacktrace-decoder obraz

Crash in area of sprinkler.cpp#L411-L413, probably due sprinkler.cpp#L355-L360

tlanglois1111 commented 1 month ago

I still see the problem in ESPHome 8. I run sprinkler zones separately based on moisture needs so auto advance is turned off. I get the same panic issue. It seems to happen when transitioning from one zone to another but I can't pinpoint the exact scenario to recreate it, because it doesn't always occur.