SuperHouse / esp-open-rtos

Open source FreeRTOS-based ESP8266 software framework
BSD 3-Clause "New" or "Revised" License
1.53k stars 491 forks source link

ARP stops working after a few minutes when running in softAP mode #405

Open chrishamm opened 7 years ago

chrishamm commented 7 years ago

I've been trying to get esp-open-rtos running on my ESP8266 and it seems to work nicely when running in station mode, but when I change to softAP mode and try to ping the IP address, it stops responding after a few minutes. When I open Wireshark, I can see that it no longer responds to ARP requests.

The symptoms are equal to this report on the Arduino port: https://github.com/esp8266/Arduino/issues/2330

ourairquality commented 7 years ago

Please give the patch in https://github.com/SuperHouse/esp-open-rtos/pull/389 a try, it resolved at least one show stopper that made softap mode unusable in practice, in the case that multiple devices connected it would break and stop - might as well eliminate that.

chrishamm commented 7 years ago

Thanks for that patch, I cherry-picked your PR and updated the LwIP submodule. However I'm afraid it breaks the sdk_wifi_set_opmode() function. Whenever I call it, I get an exception on my NodeMCU/ESP12E. Here the log:

pp_task_hdl : 3fff1dd0, prio:14, stack:512 pm_task_hdl : 3fff2a30, prio:1, stack:176 frc2_timer_task_hdl:3fff5a28, prio:12, stack:200

ESP-Open-SDK ver: 0.0.1 compiled @ Jul 5 2017 09:27:45 phy ver: 273, pp ver: 8.3

Duet WiFi Server starting... SDK version: 0.9.9 Init complete mode : null Connections: free free free free free free free free Switching to STATION_MODE mode : sta(5c:cf:7f:1a:98:2e) Fatal exception (28): epc1=0x40203514 epc2=0x00000000 epc3=0x40202782 excvaddr=0x00000038 depc=0x00000000 excsave1=0x40203511 Registers: a0 40203511 a1 3fff8290 a2 3fff0e7c a3 00000001 a4 00000020 a5 00000000 a6 00ff0000 a7 ff000000 a8 00000000 a9 00000000 a10 00001a7f a11 0000000a a12 00000000 a13 00000000 SAR 0000001f

Stack: SP=0x3fff8290 0x3fff8290: 40217754 00000000 3ffe800c 40106e5c 0x3fff82a0: ffffffff 00000000 ffffffff 00000002 0x3fff82b0: 00000002 00000001 a5302078 3fff1088 0x3fff82c0: 40215d98 00000008 00000000 4020bbc4 0x3fff82d0: a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5

Free Heap: 33252 _heap_start 0x3fff1bc0 brk 0x3fff8598 supervisor sp 0x3ffffb00 sp-brk 30056 bytes arena (total_size) 27096 fordblks (free_size) 3196 uordblocks (used_size) 23900

Investigating this didn't show anything useful to me, because addr2line only resolved the following addresses:

0x40203514 /home/christian/duet/esp-open-rtos/FreeRTOS/Source/queue.c:1610 0x40202782 /home/christian/duet/esp-open-rtos/FreeRTOS/Source/tasks.c:4173 0x40203511 /home/christian/duet/esp-open-rtos/FreeRTOS/Source/queue.c:1610 40203511 /home/christian/duet/esp-open-rtos/FreeRTOS/Source/queue.c:1610 40217754 ??:? 40106e5c /home/gus/dev/esp/rtos/newlib/build/xtensa-lx106-elf/newlib/libc/stdio/../../../../../newlib/libc/stdio/nano-vfprintf.c:531 40215d98 ??:? 4020bbc4 /home/christian/duet/esp-open-rtos/core/sysparam.c:73

I tried the http_get example as well, but even that causes a boot loop - probably due to the same error.

ourairquality commented 7 years ago

That is unexpected, it runs relatively robustly here. Just to eliminate that it has been corrected would be be prepared to try a fresh tree from https://github.com/ourairquality/esp-open-rtos which I test.

chrishamm commented 7 years ago

Hmm, I just cloned your entire repository and tried to build the http_get example once more, but the same problem persists:

pp_task_hdl : 3fff1628, prio:14, stack:512 pm_task_hdl : 3fff2288, prio:1, stack:176 frc2_timer_task_hdl:0x3fff5220, prio:12, stack:200

ESP-Open-SDK ver: 0.0.1 compiled @ Jul 5 2017 10:47:44 phy ver: 273, pp ver: 8.3

SDK version:0.9.9 mode : sta(5c:cf:7f:1a:98:2e) Fatal exception (28): epc1=0x402039ee epc2=0x00000000 epc3=0x4022792e excvaddr=0x00000038 depc=0x00000000 excsave1=0x402039eb Registers: a0 402039eb a1 3fff3fb0 a2 3fff0f14 a3 00000001 a4 00000020 a5 0000ff00 a6 00ff0000 a7 ff000000 a8 00000000 a9 00000000 a10 00001a7f a11 0000000a a12 00000000 a13 00000000 SAR 0000001f

Stack: SP=0x3fff3fb0 0x3fff3fb0: 40216ae0 00000000 3ffe800c 40106d6c 0x3fff3fc0: ffffffff ffffffff ffffffff 00000002 0x3fff3fd0: 00000002 00000001 3f302078 40215734 0x3fff3fe0: 00000008 00000000 3fff1134 4020c7dc 0x3fff3ff0: 00000003 40217f73 3ffe800c 4020233c 0x3fff4000: 006532e0 00007802 40216ba8 3fff5870 0x3fff4010: 3fff4044 3fff4040 00000000 00000000 0x3fff4020: 3fff4080 3fff4070 00000004 40203cd0

Free Heap: 40512 _heap_start 0x3fff1418 brk 0x3fff5cc0 supervisor sp 0x3ffffb00 sp-brk 40512 bytes arena (total_size) 18600 fordblks (free_size) 0 uordblocks (used_size) 18600

Please let me know if you have any more ideas.

ourairquality commented 7 years ago

Sorry, that tree had it's lwip revision out of sync, and I am surprised it even checked out without issues. Could you please try again, and also erase the flash fully just to discount that. A fresh clone and build of http_get worked here, so it would be hard for me to narrow it down.

chrishamm commented 7 years ago

Thanks, but that doesn't seem to make a difference - in fact the resulting firmware binary is just as big as the other one. Hence I enabled debugging in lwipopts.h which allowed me to capture this:

pp_task_hdl : 3fff1628, prio:14, stack:512 pm_task_hdl : 3fff2288, prio:1, stack:176 dns_init: initializing sys_timeout: 0x3fff2338 msecs=1000 handler=cyclic_timer arg=0x4021baf0 sys_timeout: 0x3fff2350 msecs=60000 handler=cyclic_timer arg=0x4021bafc sys_timeout: 0x3fff2370 msecs=500 handler=cyclic_timer arg=0x4021bb08 sys_timeout: 0x3fff2388 msecs=1000 handler=cyclic_timer arg=0x4021bb14 frc2_timer_task_hdl:0x3fff5230, prio:12, stack:200

ESP-Open-SDK ver: 0.0.1 compiled @ Jul 5 2017 12:47:11 phy ver: 273, pp ver: 8.3

SDK version:0.9.9 mode : sta(5c:cf:7f:1a:98:2e) netif: IP address of interface f set to 0.0.0.0 Fatal exception (28): epc1=0x402039ee epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000038 depc=0x00000000 bpbll`pp_task_hdl : 3fff1628, prio:14, stack:512 pm_task_hdl : 3fff2288, prio:1, stack:176 dns_init: initializing sys_timeout: 0x3fff2338 msecs=1000 handler=cyclic_timer arg=0x4021baf0 sys_timeout: 0x3fff2350 msecs=60000 handler=cyclic_timer arg=0x4021bafc sys_timeout: 0x3fff2370 msecs=500 handler=cyclic_timer arg=0x4021bb08 sys_timeout: 0x3fff2388 msecs=1000 handler=cyclic_timer arg=0x4021bb14 frc2_timer_task_hdl:0x3fff5230, prio:12, stack:200

ESP-Open-SDK ver: 0.0.1 compiled @ Jul 5 2017 12:47:11 phy ver: 273, pp ver: 8.3

SDK version:0.9.9 mode : sta(5c:cf:7f:1a:98:2e) netif: IP address of interface f set to 0.0.0.0 Fatal exception (28): epc1=0x402039ee epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000038 depc=0x00000000 excsave1=0x402039eb Registers: a0 402039eb a1 3fff3fb0 a2 3fff0f14 a3 00000001 a4 00000020 a5 0000ff00 a6 00ff0000 a7 ff000000 a8 00000000 a9 000000a0 a10 3fff14dc a11 0000000a a12 00000000 a13 00000000 SAR 0000001f

Stack: SP=0x3fff3fb0 0x3fff3fb0: 3fff4010 3fff4000 00000024 00000001 0x3fff3fc0: ffffffff 00000001 3ffe800c 40106d6c 0x3fff3fd0: 00000004 ffffffff ffffffff 4021a86c 0x3fff3fe0: 00000008 00000000 3fff1138 40210850 0x3fff3ff0: 3fff4010 3fff4000 00000004 40105fea 0x3fff4000: 00000003 000000cb 00000066 00000000 0x3fff4010: 3fff4044 3fff4040 40221948 3fff5880 0x3fff4020: 00000000 3fff13f4 3fff5c98 3fff13f8

Free Heap: 40496 _heap_start 0x3fff1418 brk 0x3fff5cd0 supervisor sp 0x3ffffb00 sp-brk 40496 bytes arena (total_size) 18616 fordblks (free_size) 0 uordblocks (used_size) 18616

So I suspect the network interface isn't properly assigned/initialised, but I'm not sure how this could be resolved.

ourairquality commented 7 years ago

Thank you for trying. Perhaps we would better take this off list as it could be a lot of noise to narrow down. Could you email me 'info at ourairquality.org' and I'll email back my build to try, and perhaps we can iterate to narrow it down.