esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 34 forks source link

Decrease the time required for resetting boot loop counter #5196

Open MnM001 opened 9 months ago

MnM001 commented 9 months ago

The problem

Hi,

I have a few lights working fine with ESPHome. These lights are turned OFF by switches around the house (so they are not always on).

Because of this the OTA “Last Boot was an unhandled reset, will proceed to safe mode in 8 restarts” kicks in every now and then and the lights boot into safe mode.

The issue I have is that some of the lights when are tuned on are only turned on for a short period of time. Currently it takes about 4 minutes to get the “Boot seems successful, resetting boot loop counter.”. Some of my lights only turned on for 2 or 3 minutes before they are turned off. Some lights are turned on for even shorter period of times (some only 30s seconds or so).

So I am trying to sort this issue out as sometimes visitors to the house press the button and no light turns on at all (because of the above - light boots into safe mode) but not sure what I need to adjust or how.

Ideally will be best to be able to lower the time required to “Boot seems successful, resetting boot loop counter.” - if someone knows how to do that please let me know. This way I can adjust individually with the correct amount of time required for a successful boot so the boot loop counter gets reset.

I rather not go down the route to increase the number of restarts required before entering safe mode as eventually I will end up with the same problem.

Thanks

Which version of ESPHome has the issue?

2023.11.6

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2023.11.3

What platform are you using?

ESP32

Board

any board

Component causing the issue

I believe is OTA??

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

scottcopus commented 8 months ago

I believe I'm having the same issue with my switched smart light bulbs in my bathrooms. I'm guess they're switched on sometimes for less than a minute, and I'm guessing that's happening more often than I'm guessing... past the current boot loop counter reset limit. What's a good way to tell if this is actually happening? Are there longer term logs kept on ESPHome devices within HA?

ssieb commented 8 months ago

You could disable safe mode.

scottcopus commented 8 months ago

@ssieb Yes, I plan to disable safe mode... eventually. After searching various component docs, that seems to be the only way to accomplish this. Unfortunately I can't do that at the moment because it currently fails to recompile... likely because my ESPHome version is a bit out of date and the vendor's external repo package likely depends on the newer version. FYI, I recently upgraded my server's OS and I want to give that a few days of uptime before changing anything else... like updating HA/ESPHome/etc.

But back to the OP's (@MnM001) suggestion.... I'm sure there's some elapsed time before ESPHome marks the devices as a successful boot. Is that uptime customizable so it could be overridden (reduced) in these cases? I'm sure it would probably need to come with an advisory though. ;)

ssieb commented 8 months ago

This really appears to be a mistake of some sort. The reboot_timeout value is also used for the reset delay. You could make that short, but then you're basically disabling safe mode as well. A workaround would be to call clean_rtc() on the ota component using a lambda. Possibly in on_boot after a delay would work.

MnM001 commented 8 months ago

@ssieb - could you please give an example on how you would use _cleanrtc() and lambda with a delay with _onboot? I have no idea how to do that.

ssieb commented 8 months ago

Use at your own risk:

on_boot:
  - delay: 30s
  - lambda: id(my_ota).clean_rtc();
MnM001 commented 8 months ago

@ssieb - thanks. I have tried and added the suggested to my on_boot:

  on_boot:
    priority: -100.0
    then:
      - light.control:
          id: ${device_id}
          state: !lambda return id(last_light_state);
      - delay: 30s
      - lambda: id(my_ota).clean_rtc();

However I get this error and compiling:

Couldn't find ID 'my_ota'. Please check you have defined an ID with that name in your configuration.

My global only has:

globals:
  - id: last_light_state
    type: boolean
    restore_value: yes
    initial_value: "false"
ssieb commented 8 months ago

I was assuming you understood more about how esphome works. You need to put an id: on your ota: component and use that there.

MnM001 commented 8 months ago

@ssieb I am still learning :)

I added the id: to the ota component and it now passes verification. Ill upload to a test light and see.

Thank for your help 👍

MnM001 commented 8 months ago

@ssieb - it didnt work - it still needed 5 minutes to reset the boot loop counter:

[11:43:50][I][app:102]: ESPHome version 2023.12.5 compiled on Jan  4 2024, 11:43:07.`
[11:48:45][I][ota:117]: Boot seems successful, resetting boot loop counter.

the config I used is:

ota:
  id: my_ota
  safe_mode: true
  password: !secret ota_password
  on_boot:
    priority: -100.0
    then:
      - light.control:
          id: ${device_id}
          state: !lambda return id(last_light_state);
      - delay: 30s
      - lambda: id(my_ota).clean_rtc();

Edit: or I am not understanding correctly what is happening. clean_rtc() - is it always keeping the boot loop counter at zero after 30 seconds? So it will never get to 8 anymore and cause the safe mode reboot?

ssieb commented 8 months ago

Yes, your edit is correct.

MnM001 commented 8 months ago

@ssieb - all working great.

I did test - and yes the test light never got into safe mode reboot if I turned it off after 30 seconds (did the off/on after 30 seconds test 13 times to be sure).

If I turn the test light off before 30 seconds it did go into safe mode reboot at the 9th attempt.

Thank you for helping with this - I will deploy the fix to my house lights now.

wimpie007 commented 7 months ago

@MnM001 i have the same problem as you... can i just use your code above?

MnM001 commented 7 months ago

@wimpie007 - sure you can try it :) Let me know if it works for you. No issue with my lights so far.

Unfocused commented 7 months ago

Just to be clear....

This is exactly what the reboot_timeout option for the OTA component is for. Is the issue here therefore that reboot_timeout also doubles as the timeout how long to stay in safemode?

(It does seem like those two things should be configured by two separate config options, and it would be an easy change)

scottcopus commented 7 months ago

@Unfocused I agree that reboot_timeout (or really anything) should probably be split into two or more config values if its value is being used for two (or more) different purposes... timing purposes in this case. It's probably better to expose it as separate configs?

wimpie007 commented 7 months ago

@MnM001 works for me also!👍

ssieb commented 7 months ago

Just to be clear....

This is exactly what the reboot_timeout option for the OTA component is for. Is the issue here therefore that reboot_timeout also doubles as the timeout how long to stay in safemode?

(It does seem like those two things should be configured by two separate config options, and it would be an easy change)

Yes, that's the issue and it would be an easy change if someone wants to do it.

MnM001 commented 7 months ago

@ssieb Question - is it possible to display the value of the OTA component?

I have some lights with a very low delay value (5s) so by the time ESPHome connects the log for the normal boot is long gone and I cant see if anything was written or not. If there is a way I can add it so it can be displayed later on and be shown in the logs.

Thanks

ssieb commented 7 months ago

What value do you mean?

MnM001 commented 7 months ago

I assume when we do this

we are writing a value somewhere. I was thinking to be able to display this value, later on in the logs.

ssieb commented 7 months ago

clean_rtc() writes a 0. There's no way to get the current value. Any further questions need to come to discord.

wimpie007 commented 3 weeks ago

tried to update light firmware to 2024.7.3 but is fails to compile on lambda: id(my_ota).clean_rtc();

something changed?

errormsg: config/esphome/ledvance06.yaml:11:15: error: 'class esphome::ESPHomeOTAComponent' has no member named 'clean_rtc'

ssieb commented 3 weeks ago

It's on the safe_mode component now. https://esphome.io/components/safe_mode Give it an id: and you can do the same thing.

wimpie007 commented 3 weeks ago

mmm, safe mode doesnt support on_boot... Must misunderstand what i should do :(

safe_mode:
  id: my_safe_mode
  on_boot:
    priority: -100.0
    then:
      - delay: 30s
      - lambda: id(my_ota).clean_rtc();
ssieb commented 3 weeks ago

You still use the same on_boot:, but it's id(my_safe_mode).clean_rtc();.

wimpie007 commented 2 weeks ago

You still use the same on_boot:, but it's id(my_safe_mode).clean_rtc();.

got it!! Thanks, works now!