Open lunde42 opened 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I agree that there appears to be a race condition around the millis overflow, but am not able to answer your question about which mutex type is best/safe here.
I wonder if this is related to #5375 as well? Devices appear to sometimes crash right after Incrementing scheduler major
.
The problem
I am seeing random freezes on my ESP32 running ESPhome.
Symptoms: The problem occurs occasionally, resulting in the device not responding or reacting to MQTT requests any more. However, a ping to the device still gets answered. After ~2 hours, the device resets itself, reboots and resumes operation. As visible in the log below, the scheduler issues the "Incrementing scheduler major" message, which should not happen as uptime is far below 50 days. I also found this issue, which may be related: https://github.com/esphome/issues/issues/2632
Possible cause: Looking at the scheduler, I see a possible race condition in Scheduler::millis_()
Proposed solution: The code in millis() should be guarded by a mutex. I would suggest adding an extra lock to the class only for this purpose. As millis() is never called with the existing lock_ held, there should be no deadlock situations.
Question: Is a standard mutex safe for use in this situation? The millis_() function may be running in interrupt context. Should I rather use the freeRTOS xSemaphoreTakeFromISR() function? If yes, what happens if xSemaphoreTakeFromISR() is called in non-ISR context? Is this possible or is this another case that needs to be handled?
Looking forward to your comments.
T.
Which version of ESPHome has the issue?
2023.4.0
What type of installation are you using?
Docker
Which version of Home Assistant has the issue?
No response
What platform are you using?
ESP32
Board
Lolin32-lite
Component causing the issue
scheduler
Example YAML snippet
Anything in the logs that might be useful for us?
Additional information
No response