espressif / esp-lwip

Fork of lwIP (https://savannah.nongnu.org/projects/lwip/) with ESP-IDF specific patches
Other
79 stars 126 forks source link

Bug with on-demand fast DHCP timer with multiple interfaces (IDFGH-9738) #53

Closed rojer closed 8 months ago

rojer commented 1 year ago

I'm pretty sure this is supposed to be outside the NETIF_FOREACH loop. It is even indented as such, yet it is inside the loop, and may, depending on when tmr_restart gets set, cause every new iteration to schedule multiple additional timers. This blows up pretty spectacularly if a DHCP server is slow or down. I am looking at a core dump of a device that ran out of memory with 3000+ dhcp_fine_timeout_cb timers :)

cc @freakyxue

david-cermak commented 1 year ago

The bug has already been fixed in https://github.com/espressif/esp-lwip/commit/86df9f44bbcb7aae03389bb36981105278323b09 and https://github.com/espressif/esp-lwip/commit/d5e56d06658ae11292be1baea56204f7120b6fa7

rojer commented 1 year ago

indeed, looks like it's been fixed on the 2.1.3 branch. but then it needs to be backported to 2.1.2, IDF 4.4 is still broken as of today.

binary1230 commented 1 year ago

Thanks for the fix!

I just ran into what is almost certainly this bug on IDF 5.0.1 by total random chance. tracked it here after a day of work figuring out trying to figure out what was up.

We had two netifs (wifi and ethernet) on our device. By mistake, our access point was powered up and the ESP32 could associate to it, but the ethernet cable from the AP to the router was (by mistake) not plugged in. So, the ESP was never able to reach a DHCP server, triggering this bug, and an out of memory crash within 15-20 seconds of boot. Nasty

Separately: not 100% sure it's the same, but I was also recently getting some really odd out-of-memory crashes in the same areas (with DHCP and sys_ timers allocations) when a device was on the edge of wifi range.


I would just +1 for, is there an ETA for when a stable IDF version might pick this up? Seems like both 4.4 and 5.0.1 are broken. If not, do you think it's reasonably safe to cherry pick those two commits into a fork and use that in production?

Thanks!

AxelLin commented 1 year ago

indeed, looks like it's been fixed on the 2.1.3 branch. but then it needs to be backported to 2.1.2, IDF 4.4 is still broken as of today.

@david-cermak IDF v4.4 and v4.3 branches also need fix. BTW, I don't find d5e56d0 in esp-lwip 2.1.2-esp branch.

espressif-abhikroy commented 8 months ago

This bug is fixed in commit https://github.com/espressif/esp-lwip/commit/8dad8d3ee66840deee4acfc1601de4e396c594be in esp-lwip 2.1.3 and in commit https://github.com/espressif/esp-lwip/commit/8290c3b8f2adaf82aa45ec992b87f16205f2689b in esp-lwip 2.1.2. This issue will be closed, and if there is any other issue, a new issue can be opened or this one can be reopened.