Closed rojer closed 8 months ago
indeed, looks like it's been fixed on the 2.1.3 branch. but then it needs to be backported to 2.1.2, IDF 4.4 is still broken as of today.
Thanks for the fix!
I just ran into what is almost certainly this bug on IDF 5.0.1 by total random chance. tracked it here after a day of work figuring out trying to figure out what was up.
We had two netifs (wifi and ethernet) on our device. By mistake, our access point was powered up and the ESP32 could associate to it, but the ethernet cable from the AP to the router was (by mistake) not plugged in. So, the ESP was never able to reach a DHCP server, triggering this bug, and an out of memory crash within 15-20 seconds of boot. Nasty
Separately: not 100% sure it's the same, but I was also recently getting some really odd out-of-memory crashes in the same areas (with DHCP and sys_ timers allocations) when a device was on the edge of wifi range.
I would just +1 for, is there an ETA for when a stable IDF version might pick this up? Seems like both 4.4 and 5.0.1 are broken. If not, do you think it's reasonably safe to cherry pick those two commits into a fork and use that in production?
Thanks!
indeed, looks like it's been fixed on the 2.1.3 branch. but then it needs to be backported to 2.1.2, IDF 4.4 is still broken as of today.
@david-cermak IDF v4.4 and v4.3 branches also need fix. BTW, I don't find d5e56d0 in esp-lwip 2.1.2-esp branch.
This bug is fixed in commit https://github.com/espressif/esp-lwip/commit/8dad8d3ee66840deee4acfc1601de4e396c594be in esp-lwip 2.1.3 and in commit https://github.com/espressif/esp-lwip/commit/8290c3b8f2adaf82aa45ec992b87f16205f2689b in esp-lwip 2.1.2. This issue will be closed, and if there is any other issue, a new issue can be opened or this one can be reopened.
I'm pretty sure this is supposed to be outside the
NETIF_FOREACH
loop. It is even indented as such, yet it is inside the loop, and may, depending on whentmr_restart
gets set, cause every new iteration to schedule multiple additional timers. This blows up pretty spectacularly if a DHCP server is slow or down. I am looking at a core dump of a device that ran out of memory with 3000+dhcp_fine_timeout_cb
timers :)cc @freakyxue