Closed ChocolateFrogsNuts closed 5 years ago
I see no issue on my side
11:04:41.957 -> SDK:2.2.2-dev(38a443e)/Core:2.5.2-115-gc298f001=20502115/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af
11:04:41.957 ->
11:04:41.957 -> Hello
11:04:41.957 ->
11:04:41.957 -> sleep enable,type: 2
11:04:41.957 -> scandone
11:04:41.957 -> wifi evt: 2
11:04:42.090 -> scandone
11:04:42.090 -> state: 0 -> 2 (b0)
11:04:42.090 -> state: 2 -> 3 (0)
11:04:42.090 -> state: 3 -> 5 (10)
11:04:42.090 -> add 0
11:04:42.090 -> aid 1
11:04:42.090 -> cnt
11:04:42.123 ->
11:04:42.123 -> connected with xxxxxx, channel 1
11:04:42.123 -> dhcp client start...
11:04:42.123 -> wifi evt: 0
11:04:42.156 -> ip:10.0.1.166,mask:255.255.255.0,gw:10.0.1.254
11:04:42.156 -> wifi evt: 3
11:04:42.951 -> ..........pm open,type:2 0
11:04:52.993 -> ..............................
11:05:23.049 -> ........................................
11:06:03.175 -> ........................................
11:06:43.298 -> ........................................
11:07:23.420 -> ........................................
11:08:03.509 -> ...
Can you please try and update master, git submodule update --init
, re-run ./get.py
in tools
, and check power supply ?
You can also
disable_extra4k_at_link_time();
anywhere in your code)Ran the commands requested:
mnix@Mike-Laptop:~/Arduino/hardware/esp8266com/esp8266$ cd tools
mnix@Mike-Laptop:~/Arduino/hardware/esp8266com/esp8266/tools$ ./get.py
Platform: x86_64-pc-linux-gnu
Tool python-placeholder.tar.gz already downloaded
Extracting dist/python-placeholder.tar.gz
Tool x86_64-linux-gnu.xtensa-lx106-elf-b40a506.1563313032.tar.gz already downloaded
Extracting dist/x86_64-linux-gnu.xtensa-lx106-elf-b40a506.1563313032.tar.gz
Tool x86_64-linux-gnu.mkspiffs-7fefeac.1563313032.tar.gz already downloaded
Extracting dist/x86_64-linux-gnu.mkspiffs-7fefeac.1563313032.tar.gz
Tool x86_64-linux-gnu.mklittlefs-7f77f2b.1563313032.tar.gz already downloaded
Extracting dist/x86_64-linux-gnu.mklittlefs-7f77f2b.1563313032.tar.gz
mnix@Mike-Laptop:~/Arduino/hardware/esp8266com/esp8266/tools$
It's plugged into a USB hub with a good 12v power supply (several amps available). I have now tried two separate externally powered USB hubs with separate power supplies and different USB ports on my laptop, as well as feeding 5v from my bench supply (good for 5A) to the board and not using a hub. The board is drawing around 550-600mA @ 5v with or without USB connected once the wifi connects. I even added a 470uF electrolytic capacitor at the board for good measure. All those tests crashed. I also left the wifi-disabled build running for about 6 hours on the same esp8266 as a test - no crashes.
I added disable_extra4k_at_link_time(); as the first line of setup() (and included coredecls.h)... it has definitely changed the output. Debug and stack decode follow....
Legacy Stack Debug output:
SDK:2.2.2-dev(38a443e)/Core:2.5.2-98-gd6973cd6=20502098/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af
Hello
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
UNUSED STACK Block 28 bytes @ 0x3ffffaf4
UNUSED STACK Block 28 bytes @ 0x3ffffc28
UNUSED STACK Block 60 bytes @ 0x3ffffc48
Largest mis-allocated stack block:15 words (60 bytes)
wifi evt: 2
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
UNUSED STACK Block 28 bytes @ 0x3ffffaf4
UNUSED STACK Block 24 bytes @ 0x3ffffc28
scandone
state: 0 -> 2 (b0)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 6
cnt
connected with apname, channel 9
dhcp client start...
wifi evt: 0
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
ip:192.168.10.60,mask:255.255.255.0,gw:192.168.10.1
wifi evt: 3
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
pm open,type:2 0
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
Fatal exception 0(IllegalInstructionCause):
epc1=0x4022bc74, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Exception (0):
epc1=0x4022bc74 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000
>>>stack>>>
ctx: sys
sp: 3ffffb40 end: 3fffffb0 offset: 01a0
3ffffce0: 400005e1 402030f8 00000000 00000030
3ffffcf0: 4022bc74 00000033 00000010 ffffffff
3ffffd00: 40104cf1 04000102 00000000 00000001
3ffffd10: fbf8ffff 04000002 3feffe00 00000100
3ffffd20: 0000001a 00000018 04000102 40104cd0
3ffffd30: 3fffc100 00000000 00000000 00000000
3ffffd40: 00000009 00000009 0000004e 00000000
3ffffd50: 0000ffff 00000008 00000016 ffffff60
3ffffd60: 401038c7 00040000 00000000 00040000
3ffffd70: 00000000 401038c4 00040000 00000030
3ffffd80: 3ffeccc0 40102823 3ffef660 00000000
3ffffd90: 402030f8 3fff03dc 000000e0 40100b8b
3ffffda0: 3ffea0c8 2c9f0300 4000050c 3fffc278
3ffffdb0: 40102674 3fffc200 00000022 401006a0
3ffffdc0: 401008ae 00000030 00000010 ffffffff
3ffffdd0: 40100977 3ffefc14 00000004 000000a5
3ffffde0: 40104ceb 40104ce8 00000000 f7ffffff
3ffffdf0: 400005e1 3fffc6fc 00000001 3ffefbf8
3ffffe00: 40213080 00000030 00000010 00000030
3ffffe10: 4021306c 3fff040c a5a5a5a5 00000004
3ffffe20: 3fff04ec 0000002f 00000000 00000135
3ffffe30: 00000035 3fffc6fc 00000001 3fffff60
3ffffe40: 000000c9 00000000 000000cc 00000000
3ffffe50: 00000001 00004208 3ffee2d8 00000000
3ffffe60: 3ffe93d8 008e0fd2 3ffefd2c 016b6c18
3ffffe70: 3ffe93e4 2c9f0300 4000050c 3fffc278
3ffffe80: 40102674 3fffc200 00000022 3ffefd2c
3ffffe90: 40000f68 00000030 00000011 ffffffff
3ffffea0: 00000020 00000000 3ffef660 00000001
3ffffeb0: 000000c9 40203088 00000020 40100b30
3ffffec0: 000000e8 00000014 3ffef660 000000cc
3ffffed0: 00000000 40202fdd 000000e8 40100b30
3ffffee0: 00000000 000000dc 3fffff60 40202fdd
3ffffef0: 4021867d 000000c9 3fffff60 4021306c
3fffff00: 00004208 00c28001 92040e00 3ffed768
3fffff10: 00000000 3fff03e4 0000001c 4020dd48
3fffff20: 3ffec5d0 4021ccfc 3ffec5d0 3ffea0a4
3fffff30: 3ffea0a4 000000ef 00000000 3fff0184
3fffff40: 3fffdc80 3fffff64 3fffff60 4020c0f4
3fffff50: 3ffea098 3fff00ec 3fff03e4 4022aedc
3fffff60: 402264ca 3ffec5d0 00000000 3fffdcb0
3fffff70: 40225e17 00000000 3fff03e4 4022cb9b
3fffff80: 40000f49 3fffdab0 3fffdab0 40000f49
3fffff90: 40000e19 40001878 00000002 00000000
3fffffa0: 3fffff10 aa55aa55 0000000a 40104ae9
<<<stack<<<
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
vd6973cd6
~ld
Legacy Stack Trace (yes it's complete - nothing decoded after the ethernet_input line) :
PC: 0x4022bc74
EXCVADDR: 0x00000000
Decoding stack results
0x402030f8: calloc_loc(size_t, size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 134
0x402030f8: calloc_loc(size_t, size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 134
0x40100b8b: umm_calloc(size_t, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1716
0x401006a0: _umm_free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1304
0x401008ae: check_poison_block(umm_block*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 819
0x40100977: check_poison_all_blocks() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 892
0x40213080: mem_malloc at core/mem.c line 221
0x4021306c: mem_malloc at core/mem.c line 210
0x40203088: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40100b30: umm_malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1685
0x40202fdd: malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 95
0x40100b30: umm_malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1685
0x40202fdd: malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 95
0x4021306c: mem_malloc at core/mem.c line 210
0x4020dd48: pbuf_alloc_LWIP2 at core/pbuf.c line 284
0x4020c0f4: esp2glue_alloc_for_recv at glue-lwip/lwip-git.c line 428
0x4022aedc: ethernet_input at glue-esp/lwip-esp.c line 352
As the decode didn't look helpful with legacy stack, I also reverted to my original code above...
Normal Stack debug output
Hello
wifi evt: 2
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 2
cnt
connected with apname, channel 9
dhcp client start...
wifi evt: 0
ip:192.168.10.60,mask:255.255.255.0,gw:192.168.10.1
wifi evt: 3
..........pm open,type:2 0
..............Fatal exception 0(IllegalInstructionCause):
epc1=0x4022bc6c, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Exception (0):
epc1=0x4022bc6c epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000
>>>stack>>>
ctx: sys
sp: 3fffecb0 end: 3fffffb0 offset: 01a0
3fffee50: 400005e1 00000608 3ffef154 00000030
3fffee60: 402024d0 00000030 00000010 00000001
3fffee70: 40000f49 3ffee598 00000000 3fffd9d0
3fffee80: 00000000 00000000 00000000 fffffffe
3fffee90: ffffffff 3fffc6fc 00000000 3fffdab0
3fffeea0: 00000000 3fffdad0 3ffee598 00000000
3fffeeb0: 3ffeea38 00000000 3ffef69a 402117e5
3fffeec0: 40218675 0000002a 3fffef30 40213064
3fffeed0: 380aa8c0 00000000 9204ffff 3ffed768
3fffeee0: 3ffef6a2 3ffef67c 00000160 401006a8
3fffeef0: 3fffdc80 3ffef0bc 3ffef64c 3ffef15c
3fffef00: 00000608 3ffeea38 3ffef67c 4020c2f8
3fffef10: 3fffdc80 3ffef0bc 3ffef654 4020c113
3fffef20: 4022af02 3ffef0bc 3ffef654 4022af13
3fffef30: 3ffef68c 3ffef67c 00000000 3fffdcb0
3fffef40: 40225e0f 00000000 3ffef654 4022cb93
3fffef50: 40000f49 3fffdab0 3fffdab0 40000f49
3fffef60: 40000e19 40001878 00000002 00000000
3fffef70: 3fffff10 aa55aa55 000000c9 40104af1
3fffef80: 40104af7 00000002 00000000 52ffe941
3fffef90: 4010000d c11211c8 03542210 6800f00d
3fffefa0: 40100dfc 3fffef3c 40100da9 3fffff68
3fffefb0: 3fffffc0 00000000 00000000 feefeffe
3fffefc0: feefeffe feefeffe feefeffe feefeffe
3fffefd0: feefeffe feefeffe feefeffe feefeffe
3fffefe0: feefeffe feefeffe feefeffe feefeffe
3fffeff0: feefeffe feefeffe feefeffe feefeffe
3ffff000: feefeffe feefeffe feefeffe feefeffe
3ffff010: feefeffe feefeffe feefeffe feefeffe
3ffff020: feefeffe feefeffe feefeffe feefeffe
3ffff030: feefeffe feefeffe feefeffe feefeffe
3ffff040: feefeffe feefeffe feefeffe feefeffe
3ffff050: feefeffe feefeffe feefeffe feefeffe
3ffff060: feefeffe feefeffe feefeffe feefeffe
3ffff070: feefeffe feefeffe feefeffe feefeffe
3ffff080: feefeffe feefeffe feefeffe feefeffe
3ffff090: feefeffe feefeffe feefeffe feefeffe
3ffff0a0: feefeffe feefeffe feefeffe feefeffe
3ffff0b0: feefeffe feefeffe feefeffe feefeffe
3ffff0c0: feefeffe feefeffe feefeffe feefeffe
3ffff0d0: feefeffe feefeffe feefeffe feefeffe
3ffff0e0: feefeffe feefeffe feefeffe feefeffe
3ffff0f0: feefeffe feefeffe feefeffe feefeffe
3ffff100: feefeffe feefeffe feefeffe feefeffe
3ffff110: feefeffe feefeffe feefeffe feefeffe
3ffff120: feefeffe feefeffe feefeffe feefeffe
3ffff130: feefeffe feefeffe feefeffe feefeffe
3ffff140: feefeffe feefeffe feefeffe feefeffe
3ffff150: feefeffe feefeffe feefeffe feefeffe
3ffff160: feefeffe feefeffe feefeffe feefeffe
3ffff170: feefeffe feefeffe feefeffe feefeffe
3ffff180: feefeffe feefeffe feefeffe feefeffe
3ffff190: feefeffe feefeffe feefeffe feefeffe
3ffff1a0: feefeffe feefeffe feefeffe feefeffe
3ffff1b0: feefeffe feefeffe feefeffe feefeffe
3ffff1c0: feefeffe feefeffe feefeffe feefeffe
3ffff1d0: feefeffe feefeffe feefeffe feefeffe
3ffff1e0: feefeffe feefeffe feefeffe feefeffe
3ffff1f0: feefeffe feefeffe feefeffe feefeffe
3ffff200: feefeffe feefeffe feefeffe feefeffe
3ffff210: feefeffe feefeffe feefeffe feefeffe
3ffff220: feefeffe feefeffe feefeffe feefeffe
3ffff230: feefeffe feefeffe feefeffe feefeffe
3ffff240: feefeffe feefeffe feefeffe feefeffe
3ffff250: feefeffe feefeffe feefeffe feefeffe
3ffff260: feefeffe feefeffe feefeffe feefeffe
3ffff270: feefeffe feefeffe feefeffe feefeffe
3ffff280: feefeffe feefeffe feefeffe feefeffe
3ffff290: feefeffe feefeffe feefeffe feefeffe
3ffff2a0: feefeffe feefeffe feefeffe feefeffe
3ffff2b0: feefeffe feefeffe feefeffe feefeffe
3ffff2c0: feefeffe feefeffe feefeffe feefeffe
3ffff2d0: feefeffe feefeffe feefeffe feefeffe
3ffff2e0: feefeffe feefeffe feefeffe feefeffe
3ffff2f0: feefeffe feefeffe feefeffe feefeffe
3ffff300: feefeffe feefeffe feefeffe feefeffe
3ffff310: feefeffe feefeffe feefeffe feefeffe
3ffff320: feefeffe feefeffe feefeffe feefeffe
3ffff330: feefeffe feefeffe feefeffe feefeffe
3ffff340: feefeffe feefeffe feefeffe feefeffe
3ffff350: feefeffe feefeffe feefeffe feefeffe
3ffff360: feefeffe feefeffe feefeffe feefeffe
3ffff370: feefeffe feefeffe feefeffe feefeffe
3ffff380: feefeffe feefeffe feefeffe feefeffe
3ffff390: feefeffe feefeffe feefeffe feefeffe
3ffff3a0: feefeffe feefeffe feefeffe feefeffe
3ffff3b0: feefeffe feefeffe feefeffe feefeffe
3ffff3c0: feefeffe feefeffe feefeffe feefeffe
3ffff3d0: feefeffe feefeffe feefeffe feefeffe
3ffff3e0: feefeffe feefeffe feefeffe feefeffe
3ffff3f0: feefeffe feefeffe feefeffe feefeffe
3ffff400: feefeffe feefeffe feefeffe feefeffe
3ffff410: feefeffe feefeffe feefeffe feefeffe
3ffff420: feefeffe feefeffe feefeffe feefeffe
3ffff430: feefeffe feefeffe feefeffe feefeffe
3ffff440: feefeffe feefeffe feefeffe feefeffe
3ffff450: feefeffe feefeffe feefeffe feefeffe
3ffff460: feefeffe feefeffe feefeffe feefeffe
3ffff470: feefeffe feefeffe feefeffe feefeffe
3ffff480: feefeffe feefeffe feefeffe feefeffe
3ffff490: feefeffe feefeffe feefeffe feefeffe
3ffff4a0: feefeffe feefeffe feefeffe feefeffe
3ffff4b0: feefeffe feefeffe feefeffe feefeffe
3ffff4c0: feefeffe feefeffe feefeffe feefeffe
3ffff4d0: feefeffe feefeffe feefeffe feefeffe
3ffff4e0: feefeffe feefeffe feefeffe feefeffe
3ffff4f0: feefeffe feefeffe feefeffe feefeffe
3ffff500: feefeffe feefeffe feefeffe feefeffe
3ffff510: feefeffe feefeffe feefeffe feefeffe
3ffff520: feefeffe feefeffe feefeffe feefeffe
3ffff530: feefeffe feefeffe feefeffe feefeffe
3ffff540: feefeffe feefeffe feefeffe feefeffe
3ffff550: feefeffe feefeffe feefeffe feefeffe
3ffff560: feefeffe feefeffe feefeffe feefeffe
3ffff570: feefeffe feefeffe feefeffe feefeffe
3ffff580: feefeffe feefeffe feefeffe feefeffe
3ffff590: feefeffe feefeffe feefeffe feefeffe
3ffff5a0: feefeffe feefeffe feefeffe feefeffe
3ffff5b0: feefeffe feefeffe feefeffe feefeffe
3ffff5c0: feefeffe feefeffe feefeffe feefeffe
3ffff5d0: feefeffe feefeffe feefeffe feefeffe
3ffff5e0: feefeffe feefeffe feefeffe feefeffe
3ffff5f0: feefeffe feefeffe feefeffe feefeffe
3ffff600: feefeffe feefeffe feefeffe feefeffe
3ffff610: feefeffe feefeffe feefeffe feefeffe
3ffff620: feefeffe feefeffe feefeffe feefeffe
3ffff630: feefeffe feefeffe feefeffe feefeffe
3ffff640: feefeffe feefeffe feefeffe feefeffe
3ffff650: feefeffe feefeffe feefeffe feefeffe
3ffff660: feefeffe feefeffe feefeffe feefeffe
3ffff670: feefeffe feefeffe feefeffe feefeffe
3ffff680: feefeffe feefeffe feefeffe feefeffe
3ffff690: feefeffe feefeffe feefeffe feefeffe
3ffff6a0: feefeffe feefeffe feefeffe feefeffe
3ffff6b0: feefeffe feefeffe feefeffe feefeffe
3ffff6c0: feefeffe feefeffe feefeffe feefeffe
3ffff6d0: feefeffe feefeffe feefeffe feefeffe
3ffff6e0: feefeffe feefeffe feefeffe feefeffe
3ffff6f0: feefeffe feefeffe feefeffe feefeffe
3ffff700: feefeffe feefeffe feefeffe feefeffe
3ffff710: feefeffe feefeffe feefeffe feefeffe
3ffff720: feefeffe feefeffe feefeffe feefeffe
3ffff730: feefeffe feefeffe feefeffe feefeffe
3ffff740: feefeffe feefeffe feefeffe feefeffe
3ffff750: feefeffe feefeffe feefeffe feefeffe
3ffff760: feefeffe feefeffe feefeffe feefeffe
3ffff770: feefeffe feefeffe feefeffe feefeffe
3ffff780: feefeffe feefeffe feefeffe feefeffe
3ffff790: feefeffe feefeffe feefeffe feefeffe
3ffff7a0: feefeffe feefeffe feefeffe feefeffe
3ffff7b0: feefeffe feefeffe feefeffe feefeffe
3ffff7c0: feefeffe feefeffe feefeffe feefeffe
3ffff7d0: feefeffe feefeffe feefeffe feefeffe
3ffff7e0: feefeffe feefeffe feefeffe feefeffe
3ffff7f0: feefeffe feefeffe feefeffe feefeffe
3ffff800: feefeffe feefeffe feefeffe feefeffe
3ffff810: feefeffe feefeffe feefeffe feefeffe
3ffff820: feefeffe feefeffe feefeffe feefeffe
3ffff830: feefeffe feefeffe feefeffe feefeffe
3ffff840: feefeffe feefeffe feefeffe feefeffe
3ffff850: feefeffe feefeffe feefeffe feefeffe
3ffff860: feefeffe feefeffe feefeffe feefeffe
3ffff870: feefeffe feefeffe feefeffe feefeffe
3ffff880: feefeffe feefeffe feefeffe feefeffe
3ffff890: feefeffe feefeffe feefeffe feefeffe
3ffff8a0: feefeffe feefeffe feefeffe feefeffe
3ffff8b0: feefeffe feefeffe feefeffe feefeffe
3ffff8c0: feefeffe feefeffe feefeffe feefeffe
3ffff8d0: feefeffe feefeffe feefeffe feefeffe
3ffff8e0: feefeffe feefeffe feefeffe feefeffe
3ffff8f0: feefeffe feefeffe feefeffe feefeffe
3ffff900: feefeffe feefeffe feefeffe feefeffe
3ffff910: feefeffe feefeffe feefeffe feefeffe
3ffff920: feefeffe feefeffe feefeffe feefeffe
3ffff930: feefeffe feefeffe feefeffe feefeffe
3ffff940: feefeffe feefeffe feefeffe feefeffe
3ffff950: feefeffe feefeffe feefeffe feefeffe
3ffff960: feefeffe feefeffe feefeffe feefeffe
3ffff970: feefeffe feefeffe feefeffe feefeffe
3ffff980: feefeffe feefeffe feefeffe feefeffe
3ffff990: feefeffe feefeffe feefeffe feefeffe
3ffff9a0: feefeffe feefeffe feefeffe feefeffe
3ffff9b0: feefeffe feefeffe feefeffe feefeffe
3ffff9c0: feefeffe feefeffe feefeffe feefeffe
3ffff9d0: feefeffe feefeffe feefeffe feefeffe
3ffff9e0: feefeffe feefeffe feefeffe feefeffe
3ffff9f0: feefeffe feefeffe feefeffe feefeffe
3ffffa00: feefeffe feefeffe feefeffe feefeffe
3ffffa10: feefeffe feefeffe feefeffe feefeffe
3ffffa20: feefeffe feefeffe feefeffe feefeffe
3ffffa30: feefeffe feefeffe feefeffe feefeffe
3ffffa40: feefeffe feefeffe feefeffe feefeffe
3ffffa50: feefeffe feefeffe feefeffe feefeffe
3ffffa60: feefeffe feefeffe feefeffe feefeffe
3ffffa70: feefeffe feefeffe feefeffe feefeffe
3ffffa80: feefeffe feefeffe feefeffe feefeffe
3ffffa90: feefeffe feefeffe feefeffe feefeffe
3ffffaa0: feefeffe feefeffe feefeffe feefeffe
3ffffab0: feefeffe feefeffe feefeffe feefeffe
3ffffac0: feefeffe feefeffe feefeffe feefeffe
3ffffad0: feefeffe feefeffe feefeffe feefeffe
3ffffae0: feefeffe feefeffe feefeffe feefeffe
3ffffaf0: feefeffe feefeffe feefeffe feefeffe
3ffffb00: feefeffe feefeffe feefeffe feefeffe
3ffffb10: feefeffe feefeffe feefeffe feefeffe
3ffffb20: feefeffe feefeffe feefeffe feefeffe
3ffffb30: feefeffe feefeffe feefeffe feefeffe
3ffffb40: feefeffe feefeffe feefeffe feefeffe
3ffffb50: feefeffe feefeffe feefeffe feefeffe
3ffffb60: feefeffe feefeffe feefeffe feefeffe
3ffffb70: feefeffe feefeffe feefeffe feefeffe
3ffffb80: feefeffe feefeffe feefeffe feefeffe
3ffffb90: feefeffe feefeffe feefeffe feefeffe
3ffffba0: feefeffe feefeffe feefeffe feefeffe
3ffffbb0: feefeffe feefeffe feefeffe feefeffe
3ffffbc0: feefeffe feefeffe feefeffe feefeffe
3ffffbd0: feefeffe feefeffe feefeffe feefeffe
3ffffbe0: feefeffe feefeffe feefeffe feefeffe
3ffffbf0: feefeffe feefeffe feefeffe feefeffe
3ffffc00: feefeffe feefeffe feefeffe feefeffe
3ffffc10: feefeffe feefeffe feefeffe feefeffe
3ffffc20: feefeffe feefeffe feefeffe feefeffe
3ffffc30: feefeffe feefeffe feefeffe feefeffe
3ffffc40: feefeffe feefeffe feefeffe feefeffe
3ffffc50: feefeffe feefeffe feefeffe feefeffe
3ffffc60: 0000000a 40104d80 0000000a 00000000
3ffffc70: 40001da0 0000000a feefeffe feefeffe
3ffffc80: 40001db4 feefeffe feefeffe feefeffe
3ffffc90: 00000002 00000000 0000000a 00000000
3ffffca0: 00000002 00000000 0000000a 00000000
3ffffcb0: feefeffe feefeffe feefeffe feefeffe
3ffffcc0: 00000000 a0000000 00000000 0000001c
3ffffcd0: 00002000 80af1999 00002000 00000000
3ffffce0: 3ffffe40 00000000 3ffffe40 4020a15e
3ffffcf0: 0000a000 3ffffde3 3ffee630 00000000
3ffffd00: 00000000 40203084 40205e6d 00000008
3ffffd10: 3ffffe40 00000008 3ffffe40 4020a15e
3ffffd20: 3ffffda0 3ffffddb 3ffffd50 00000000
3ffffd30: fffffffe 00000000 40101857 4020a098
3ffffd40: 3ffffe40 3ffffddb 3ffffda0 40205f6c
3ffffd50: 00000008 40226077 3ffef25c 3ffecf1c
3ffffd60: 3ffe8304 00000000 0000000a 4023b3a0
3ffffd70: 3ffffde3 00000002 00000000 00000008
3ffffd80: 3ffef316 3ffeece4 00000008 3ffe8704
3ffffd90: 00000000 3ffe8703 3ffffe40 4020a56f
3ffffda0: 00000000 ffffffff ffffffff 00000000
3ffffdb0: 00000008 00000008 3f302064 00000000
3ffffdc0: 3ffede50 3ffed740 00000001 00000001
3ffffdd0: 00000000 000035c6 3221d807 32303530
3ffffde0: 00393930 4021de70 3ffed614 00000000
3ffffdf0: 3ffed000 4021d77c 00000000 00000012
3ffffe00: 00000005 00000000 00000020 40101712
3ffffe10: 3ffe8b45 401049eb 3ffec5a8 3ffee630
3ffffe20: 401022b7 3ffec5a8 0000008c 40100d36
3ffffe30: fffffff7 01634a7c 3ffed000 40102494
3ffffe40: 3ffe93d8 00000000 00000000 ffff0208
3ffffe50: fffffff7 01634a7c 4010295a 00000100
3ffffe60: 3ffe93d8 7fffffff 00000000 00000001
3ffffe70: 00000001 00000080 4000050c 3fffc278
3ffffe80: 3ffe93d8 00000030 00000010 01634a7c
3ffffe90: 3ffe93e4 2c9f0300 4000050c 3fffc278
3ffffea0: 4010267c 3fffc200 00000022 002e2e2e
3ffffeb0: 40202671 00000030 00000010 ffffffff
3ffffec0: 40100dc1 40100dbc 40202668 00000000
3ffffed0: 00000000 00000000 00000000 fffffffe
3ffffee0: ffffffff 3fffc6fc 00000001 3ffe850c
3ffffef0: 00000000 3fffdad0 3ffee598 00000030
3fffff00: 3fffdad0 3fffff3c 3fffff48 3ffee598
3fffff10: 3fffdad0 00000075 3ffee47c 402016fc
3fffff20: 3ffe86ef 00000000 3ffe86ee 4020351a
3fffff30: 40105155 0074bbf2 3ffe864a 3ffee598
3fffff40: 40105202 3ffeceac 0074bbf2 00000000
3fffff50: 401053db 0074c830 3ffee5f8 00000000
3fffff60: 3ffede50 3ffee5f8 3ffe850c 3ffee5f8
3fffff70: 3fffdad0 3ffee598 4020253b 3fffefa0
3fffff80: 3ffee5f8 3fffdad0 0000000a 40202b3b
3fffff90: 00000000 00000000 3ffee568 402011c0
3fffffa0: 3fffdad0 00000000 3ffee568 40202688
<<<stack<<<
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
vf78ab66f
~ld
Normal Stack Decoded:
Exception 0: Illegal instruction
PC: 0x4022bc6c
EXCVADDR: 0x00000000
Decoding stack results
0x402024d0: loop_task(ETSEvent*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 144
0x402117e5: etharp_input at core/ipv4/etharp.c line 742
0x40213064: mem_malloc at core/mem.c line 210
0x401006a8: _umm_free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1304
0x4020c2f8: ethernet_input_LWIP2 at netif/ethernet.c line 207
0x4020c113: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 441
0x4022af02: ethernet_input at glue-esp/lwip-esp.c line 363
0x4022af13: ethernet_input at glue-esp/lwip-esp.c line 371
0x4020a15e: __ssputs_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 233
0x40203084: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40205e6d: _printf_i at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 194
0x4020a15e: __ssputs_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 233
0x4020a098: __ssputs_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 180
0x40205f6c: _printf_i at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 244
0x4020a56f: _svfprintf_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 660
0x40100d36: umm_realloc(void*, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1745
0x40202671: loop_wrapper() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 134
0x40202668: loop_wrapper() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 132
0x402016fc: HardwareSerial::write(unsigned char const*, unsigned int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/HardwareSerial.h line 158
0x4020351a: uart_write(uart_t*, char const*, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/uart.cpp line 498
0x4020253b: esp_yield() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 90
0x40202b3b: __delay(unsigned long) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_wiring.cpp line 54
0x402011c0: loop() at /home/mnix/Arduino/blank/blank.ino line 53
0x40202688: loop_wrapper() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 140
Also any given build will crash consistently (ie at the same address with the same trace) although changing the code, building then changing it back building and flashing can give a different result.
I'm beginning to wonder if my build environment has an issue... although that would not explain why I have never had an issue with the Arduino IDE developing for Arduinos, but I have this issue with all releases of the esp8266 tools since 2.4.0 (may go back further but that was as far as I tested)
If I hadn't already tested with several WEMOS D1 boards which all crashed identically for a given build I would be sure it was dodgy hardware.
oops, can't read the display on my bench supply :-) Actual power consumption of the board is 55-60mA @ 5v with wifi on and usb disconnected.
Please enable all debug options, so we may get more info.
I can't see an "ALL" option, but the above were done with what appears to be the "everything enabled" option - the second-last in the list, with all the other items in it. SSL+TLS_MEM+HTTP_CLIENT+HTTP_SERVER+CORE+WIFI+HTTP_UPDATE+UPDATER+OTA+OOM
Yes, this entry.
Just to be clear, I had that option selected when I captured the output I posted above - so there's no much point in posting the same again....
If only we had access to the SDK source I could probably have a fix sorted in a couple of hours :-/
Can you post an archive with sketch source, selected options and the elf file ?
Ok, that should be what you need - complete preferences.txt from ~/.arduino15, the sketch and the binary file that was generated, plus a matching set of debug and stack decodes.
Interestingly this particular build is crashing very early in the run - much earlier than before, even though the sketch has not been modified since my last set of logs. As soon as the sketch started to run, there were two crashes and re-starts, with different exceptions but the same symptoms - extremely large stack with a large un-modified block in the middle (still set to feefeffe)
Hopefully you can work something out from that..
oops, just realised the bin file in that archive isn't the elf file you wanted... correct file attached... blank-elf.zip
Apologies for calling the sketch "blank" it kind of grew from the example :(
@ChocolateFrogsNuts wrote: If only we had access to the SDK source I could probably have a fix sorted in a couple of hours :-/
: Read through this issue looking for sources of sdk 2.2.1; additional googling for sources led to: https://github.com/espressif/ESP8266_NONOS_SDK/tree/v2.2.1 (Just switch from branch:master to tag v2.2.1 in https://github.com/espressif/ESP8266_NONOS_SDK)
In Arduino ide version of "board esp8266 package" seems to be equal to esp8266 sdk version
@spblinux at a quick glance it appears to offer no more than the tools/sdk directory in the Arduino library... but it is late here, so I will have a closer look in the morning.
@ChocolateFrogsNuts sdk example code and sources of lib driver.a are stripped out in arduino ide version of sdk. But source code of functions referenced in ld/eagle.rom.addr.v6.ld is missing in both variants of sdk. (In my case I needed the source code of a uart controlled esp8285 "wifi shield" which is in examples/at.)
@spblinux we don't use libdriver.a. It is a leftover that shouldn't be in this repository.
About ld/eagle.rom.addr.v6.ld
, this is 1) closed source 2) in rom (on the silicium, not in a .a
)
@ChocolateFrogsNuts
I'm sorry I still can't reproduce with your binary
SDK:2.2.2-dev(38a443e)/Core:2.5.2-99-gf78ab66f=20502099/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af
...
22:37:46.757 -> ..scandone
22:37:48.485 -> no Newbicup found, reconnect after 1s
22:37:48.518 -> wifi evt: 1
22:37:48.518 -> STA disconnect: 201
22:37:48.584 -> reconnect
22:37:48.784 -> ..wifi evt: 2
22:37:50.778 -> .scandone
22:37:52.406 -> state: 0 -> 2 (b0)
22:37:52.406 -> .state: 2 -> 3 (0)
22:37:52.406 -> state: 3 -> 5 (10)
22:37:52.406 -> add 0
22:37:52.406 -> aid 1
22:37:52.406 -> cnt
22:37:52.439 ->
22:37:52.439 -> connected with Newbicup, channel 11
22:37:52.538 -> dhcp client start...
22:37:52.538 -> wifi evt: 0
22:37:53.402 -> ..ip:192.168.43.154,mask:255.255.255.0,gw:192.168.43.103
22:37:55.363 -> wifi evt: 3
22:37:55.429 -> .......pm open,type:2 0
22:38:02.442 -> .........................
22:38:27.646 -> ........................................
22:39:07.910 -> ........................................
22:39:48.188 -> ........................................
22:40:28.492 -> ........................................
22:41:08.789 -> ........................................
22:41:49.086 -> ........................................
22:42:29.389 -> ........................................
22:43:09.693 -> ........................................
22:43:49.975 -> ................................
That could be a worse version of the #2330 issue (per your comment)
//wifi_set_sleep_type(NONE_SLEEP_T); // this doesn't seem to crash as much
wifi_set_sleep_type(MODEM_SLEEP_T); // this is the default and crashes better
Can you try with your mobile phone as AP ? (that's what I did)
Additional Info: I just tried it on a different wifi network - one with zero traffic (an OKI printer as the AP) - and it is not crashing! This explains why you can't get the issue to happen for you, it's triggered by certain data packets on the wifi. Further testing connected to a Samsung Galaxy Tab S4 tablet with hotspot turned on reveals it is any wifi with internet access. If I have hotspot turned on but mobile data off and no connection to my normal wifi, there are no crashes at all. The second I turn on mobile data, or wifi (the Tab S4 can share another wifi network via hotspot - there's something I would never have known!), anyway the second the hotspot device gets an internet connection via mobile data or sharing my normal wifi, the ESP code starts crashing, and when I disable that internet connection the ESP stops crashing. Also, if the hotspot has mobile data enabled, but does not have phone signal there are no crashes.
When it crashes on my normal wifi network, I don't see any traffic from the ESP chip other than the usual DHCP after the stack dump has finished - my AP is not the gateway so I can get that with a 'tcpdump -s 1500 -X ether host xx:xx:xx:xx:xx:xx' on the gateway.
For some reason it still crashes when I drop the internet connection from my gateway though - I can only assume the ESP code still thinks it has an internet connection, so still does whatever it is that is the problem.
Yay! this is progress - I can now control when it crashes, which hopefully narrows down why a little.
Hmm, looks like it could in fact be related to #2330 - Connected to the Tab S4 hotspot with no internet connection, it did not crash... until...
Using an app called Network Utilities, I can ping the ESP without any issues, however if I use the "IP Discovery" tool in that app, I get an instant crash every time - not only that it seems to cause a whole series of crashes (probably from multiple ARP requests).
I will go looking in the lwip arp code later today... I have something else I have to tend to this morning :-)
And now this issue became even more interesting :) Do you have a link to that specific app?
Ok, further testing and I am increasingly convinced the stack is being used up by the binary blob... I can't be certain it isn't a bad or missing piece of initialisation by the Arduino code... yet... but every time I decode a stack trace and locate the decoded calls in the trace I see the same thing:
So I tried several of the LWIP options - all crashed in the same way, including the "v1.4 Compile from source" option.
With that option selected (easiest way to re-compile lwip), I then added one line of code to tools/sdk/lwip/src/netif/etharp.c Line 1317 - straight after the local variables in ethernet_input()
if (&p < (0x3FFFFFB0 - 4000)) goto free_and_return;
That one line, which has no effect unless the stack is already full, seems to make it almost immune to crashes no matter how many ARP requests I send it! Unfortunately it is also almost completely deaf to IP traffic, but at least we know why ARP is behaving badly - it looks like every time network traffic is received, a "random" amount of stack is allocated! I suppose it's still possible there is memory corruption somewhere else as the root cause, but that just doesn't sound right - I can't think of any sane code that would allocate stack space based on a size read from somewhere, it's usually small variables of fixed size.
The app I was using on the tablet is https://play.google.com/store/apps/details?id=com.myprog.netutils
If stack is a concern, please do your tests with this additional call:
disable_extra4k_at_link_time(); // anywhere in the sketch
(this will allocate the stack in user heap space - which is the SDK default, not within the system stack space, which is a hack not compatible for example when WPS is used - for the record, this case is automatically handled)
I've had disable_extra4k_at_link_time() back in while I try to work out the details of the stack. Seeing a lot of Illegal Instruction exceptions, and the PC is at odd locations (ie not aligned with instructions) The Esp Exception Decoder has been of limited help - it only decodes parts of the stack as it relies on gdb. I've been working on something that reads the objdump of the elf and produces a more complete stack trace (in one case 10 functions are on the stack, but gdb can only name 3 of them)
I still don't like the look of the stacks I am seeing.... for example:
Exception 0: Illegal instruction
PC: 0x40228a98
EXCVADDR: 0x00000000
Decoding stack results
0x40207669: _vsnprintf_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/vsnprintf.c line 73
0x40100870: check_poison_block(umm_block*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 846
0x40202f9c: calloc_loc(size_t, size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 134
0x40100cc1: umm_calloc(size_t, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1716
0x4010067c: _umm_free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1304
0x40100ea4: free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1764
0x40222f7e: pbuf_free at core/pbuf.c line 752
0x40202f28: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40100c56: umm_malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1685
0x40202f28: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40100274: pvPortMalloc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 68
0x4022311d: pbuf_alloc at core/pbuf.c line 388
0x40222096: ethernet_input at netif/etharp.c line 1412
0x401000b8: app_entry() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 263
There is no way umm_malloc should call malloc_loc then back to pbuf_free. This tells me the stack pointer has been messed with and these are old calls... but figuring out just where this is happening is doing my head in :-( Note that the PC was off in user_uart_wait_tx_fifo_empty() so this still looks like a corrupted return address on the stack, most likely it is being sent off into some random function, and when it executes the return from that, the different sized stack frame loads a "random" stack pointer and PC.
This could be anywhere... next job is to get more debugging output from the arduino libraries - there still seems to be stuff that is not enabled
Note that I am leaning towards a block of heap being free'd but the pointer being left in play somewhere as the root cause... something has somehow caused check_poison_block to be called on a block that has already been free'd at some point...
One more thing you can try is enabling gdb. A crash like this goes into agdb break and you can try the stack unwinder or look at system memory. I've got a pr that does a core dump that gdb plus a custom app can load, like loading a core file on Unix, but the stack unwinding is more foolproof on a live system.
Hmm, today I completed my own stack decoder that does a better job than the ESP Exception Decoder tool, which simply prints out the line of code associated with every user code address found in the stack, if gdb can provide it. The decoder I wrote actually attempts to walk the stack frames (which are not always the same size) in both directions to establish the integrity of the stack. It uses objdump on the elf file and gets not only the function addresses but the size of the stack frame that function creates. This allows it to walk all the firmware calls on the stack. In addition it loads the linker file with the ROM function addresses (rom_8266.ld) allowing me to identify those functions - but not the size of their stack frames. This all gives me a really good solid stack dumper that knows for sure if the stack is valid.
So what have I worked out about the stack on these crashes? Well firstly even with enable_extra4k_at_link_time() the dump is of the "sys" stack, not the "cont" stack. (and yes I did try a deliberate exception in my code to make sure the "cont" stack is dumped when appropriate). Secondly, the stack dumps I am getting are trash :-( they are almost entirely the left over data that would normally be beyond the stack pointer, and not dumped at all - except it is dumped because the stack pointer is way off. We can't trust anything on the stack to tell us where it was. I have also tried dumping the heap with umm_info() - so far it appears to be in tact.
At this point, trying to get gdb to do anything with a stack or core dump is going to be fairly useless I think :( I need a way to step through the live code and see what is happening, or I need to get more debug output from all the code in the form of text out the serial port.
It's not going to be easy to tell exactly where the thing is when it initially goes wrong.
Hmm, that's bug hunting for pro's :)
I would love to see good stack decoding utils for this platform.
@earlephilhower one of the things I found today was espressif's gdbstub which can be compiled in with your code and apparently allows gdb to step through the live code on chip. I haven't ruled out trying it out yet.....
I have at least managed to get lwip to start dumping massive amounts of debug info to Serial, but wouldn't you know it, now that I might be able to see what happens, hitting it with ARP requests from my tablet no longer crashes it. $#@%*@^#$$ computers.....
Fortunately it still crashes on my normal network... and so far there has been a gap of around 1 second between the last network traffic and the exception, and one exception happened before any network traffic was logged.....
Time to find more debugging options to enable :-/
Oh, and @TD-er don't get your hopes up - my stack decoder is a fairly rough perl script for the command line - nothing like the exception decoder tool for the IDE, although I may do a conversion/upgrade/shove the two together some day.
@ChocolateFrogsNuts don't use the Espressif version, it doesn't support sharing the UART and has a custom format that's incompatible with the plain-GNU toolchain we've been using.
How to use the included one is in here: https://github.com/esp8266/Arduino/blob/master/doc/gdb.rst
It has full GDB support, including single stepping/breakpoint/Ctrl-C interrupt/etc. on the live system. We have no ELF or source for the blob or ROMs, though, so don't expect to be able to get anything other than assembly inside non-open source code.
Oh, and @TD-er don't get your hopes up - my stack decoder is a fairly rough perl script for the command line - nothing like the exception decoder tool for the IDE, although I may do a conversion/upgrade/shove the two together some day.
Well that's the nice thing with open source. No matter how ugly the hack is to make something work, there is always someone out there to polish it, if it is based on a good idea.
@ChocolateFrogsNuts please continue pursuit, you may be on to a stability issue. Also, in case you haven't realized it already, @earlephilhower has done a lot of work related to what you're looking at, so please discuss with him.
thanks @earlephilhower I'll keep it in mind if the current line of attack fails...
Currently I'm doing my own build of lwip2. I've got debugging output from the glue layer, it tells me plenty, but it isn't doing anything for about a second before the crash, although there seems to be a packet repeated several times over the previous 10 seconds, so I will find out what that is.
I'm working on getting more output by enabling more debugging from lwip2 itself to see if anything is happening in there. At least it's still crashing consistently on my network.... while there's crash, there's hope :-)
EDIT: got the lwip2 debugging on... I turned everything on... This will take some time to analyse :)
and then I hit the reset button and get a crash before lwip2 or the glue layer emit a single character...
Wondering if the problem is in the WiFi code now... off to enable debugging over there...
@earlephilhower I noticed that uart.cpp
installs a UART driver that calls delay(0). I am wondering about how the logic flows when the SDK is calling to print a few characters and delay(0) is called. Would it try to yield back to the SDK?
As a quick test, I was initially thinking these two commands could be used to restore back to a non-delaying UART driver; however, I am not sure it wouldn't have an adverse effect on something else that I am not aware of. I have tried them and they seem to cause no harm, but then I cannot recreate the problem either.
extern "C" void ets_install_uart_printf(void); // restore putc1 handler back to ROM version
extern "C" void uart_buff_switch(uint8_t); // Select UART: 0 => Serial, and 1 => Serial1
ets_install_uart_printf(); // Place after Serial.begin(...)
uart_buff_switch(0); // Select UART0 (Serial)
@earlephilhower just tested as per suggestion - no difference. The yield() type functions all call run_scheduled_functions() and/or run_scheduled_recurrent_functions() which are part of the SDK. Note that it has crashed with the same symptoms with a completely blank sketch - ie no attempt at serial output at all.
So far I have added debug statements to all the ESP8266WiFi* class functions, and enabled all the debugging in lwip2 and the glue. There's generally several seconds of nothing happening before the crash, except for the lwip timers ticking away with nothing to do. None of the timer handlers are running when it crashes, and there is no traffic reported by the glue for several seconds. There's nothing coming from the WiFi classes either. I've set up a custom_crash_callback() to run umm_info() for a heap listing, and display the wifi status when it crashes - nothing unusual there. The integrity checks are enabled for umm_malloc as well. I even tried running some code that pseudo-randomly allocated/freed heap to fragment it, and tested for any corruption of the allocated blocks. No problems there.
What else is likely to be running? What else could be running?
I'm starting to think I need to be looking for a struct passed to the blob that has been allocated on the stack when it shouldn't be.
Checked out a few things using etstimer* - all trivial stuff incrementing counters etc.
Got something interesting here though:
I think that translates as a block marked as free isn't on the list of free blocks... The promiscuous mode callback I used does nothing more than print the address and size of the data block. Even if the callback does nothing it still crashes the same. The suspect block seems to be one that has been given to the callback, or is next to one given to the callback.
Not sure what is going on there, but in promiscuous mode, the traffic is sent to my callback and never goes to lwip - so that rules out anything the network stack might be doing with data as it never gets called. It could be that the rom code, the blob and umm_malloc are not playing nice with one-another with memory allocation. Should we even be doing anything other than using the mem_malloc which appears to be in the rom code, and which is presumably called by the blob?
ahh yes, missed the lack of a critical section in integrity_check() - well at least I think my brain assumed it was called from within one :-/ thanks @mhightower83 Threw one in and now the "crash" has turned into random wdt resets when in promiscuous mode... I guess we now have a race condition there - I don't think I'll chase that as it only exists when debugging is enabled in umm_malloc() and there are already wiser people working on the issue.
Back to those timers running in the network code... I must have missed something, so I'll start working through disabling one at a time...
Receiving WDT in combination with WiFi has become my specialty... Please have a look at whether or not the node tries to disconnect, or maybe the AP assumes it is disconnecting.. Just make sure you're following this sequence when connecting and reconnecting. Just to be sure you're looking at the issue you think you're looking at.
Ok, so I've disabled features in LWIP2 to the point that all that is left is basic ethernet, ip, tcp, udp with no timer events (even ARP is off). I've short-circuited ethernet_input() in the glue layer to do nothing more than pbuf_free(); return; There is now nothing network related running, and the few packets that arrive to the glue layer are simply dropped - the same as happens in promiscuous mode. Yet still I get no crashes (left it running over 30 minutes) in promiscuous mode, while normal mode connected to the AP produces soft reboots (cause:4) and a message with "wdt reset", and/or seemingly random exceptions on a regular (within a minute) basis.
The only difference is the rom code is maintaining association with the AP... (I doubt it's a power issue, I have had this same chip transmitting a constant stream of traffic over the wifi before, and if anything it crashed much less - certainly wifi_set_sleep_type(NONE_SLEEP_T) is more stable).
@TD-er I checked the Wifi setup - added a forceSleepWake(); delay(100); to match that sequence, but no change.
Somewhere I saw some SDK callbacks related to the low-level wifi.... time to hook some of those.
AHA! I have just discovered the correct usage of ets_intr_lock() / ets_intr_unlock() thanks to @trebisky having disassembled the esp8266 boot rom, and a read of the Xtensa instruction set reference manual!
ets_intr_lock() actually returns the old interrupt level, making it possible to save it. Applied this new knowledge to few uses of etsintr* in umm_malloc and lwIP, but no noticable improvement on the wdt resets :(
The SDK callbacks I saw weren't useful in getting any further debugging info either.
Well now, it's getting interesting.... with sleep mode MODEM I still get regular wdt resets, and sometimes a crash with a stack dump. I just tried sleep mode NONE, with no other changes, and after 20 minutes it was still running... I've turned down the debugging, turned on all the lwIP stuff I had turned off and will let it run...
Maybe that little fix to umm_malloc() with better use of ets_intr_lock() actually did something... 10 minutes and going strong!
Check the doc about wifi none sleep: They are cheating with other modes.
Well it made 105 minutes before it crashed... further testing shows the particular circumstance where it might be a problem is very rare, apart from a few times at initial startup - ie before setup() runs.
@d-a-v actually MODEM sleep means shutting down the RF module - so wifi is effectively off. I can see why it isn't the mode to run a wifi connection in, and I'm quite happy to run NONE for normal operation. I selected that mode to debug because it makes for a considerably faster test cycle... I get a result in 110 seconds, not 110 minutes.
I would love to know what it is about MODEM sleep that makes it crash so well though. Theoretically, the chip should be able to run with the RF off - well actually it does as long as you call WiFi.disconnect() it never crashes - but whatever it is that crashes for MODEM sleep seems to also crash on rare occasions for NONE sleep.
Got 10 minutes before it crashed again (NONE sleep) this morning. Clearly no progress yet.
Well the thing is, you're not the only one to call disconnect. The AP may also disconnect your node and it may happen more often than you would think. For example when the AP tries to detect a better channel (set channel fixed in your AP for testing this bug you're hunting down)
Some AP's also may issue a disconnect after receiving some amount of CRC errors in received packets and it may not even have to be the ones you're sending. The issue I linked here is all about the crashes occurring during disconnect/reconnect.
Some other things to keep in mind:
Yes I'm aware there are many causes of a disconnect... however any disconnect should trigger a wifi event - I made sure the wifi event hook has a debug print on it, and it works for normal connect/disconnect/etc. Nothing there when it crashes.
I have also tried debugging statements in the glue between the esp blob and lwip ensuring there is no network traffic going into lwip for it to mis-handle, and no traffic coming out of lwip to the esp. I've put it back to "normal" now (ie traffic can flow as intended) and hooked the phy_capture callback from the glue to gather stats.
The thing is when it does crash it gives an exception (illegal instruction for an address that isn't aligned with an instruction) dumps the stack and re-starts immediately. There's no time to even notice if it's not responding to arp or anything like that as by the time you realise it just crashed, it's back up and running.. I'm not bothered about the wdt resets - they ONLY occur under abnormal operating conditions, ie MODEM sleep with wifi connected - It's the random exceptions with a trashed stack I'm chasing.
I've left it running with my laptop pinging it... I will know in about 110 minutes if it's making a difference :-/
.......... and the answer is.... no, after 10 minutes an exception 0 and stack dump.
I also tried it with arping but got the same result. The arp reply was generally arriving in 3-5ms, and occasionally it would take 10 or even 100ms, but that was not related to when it crashed with a stack dump.
For WiFi connected devices, those are just as expected. (also the 100ms, which I suspect will even be 110 msec, since 100 msec is about the beacon interval time) I guess you had the arpping running continuously? I've seen that when you run it right after the power consumption of the device lowers, that the ping request may sometimes take upto 800 msec. But as soon as it replies, the power consumption rises again and the replies keep coming as fast as you've also seen.
I am interested in the WDT reboots, as I think they may somehow be related. (when not crashing fast, you eventually run into WDT reboots, but the cause may be the same) But I totally get it when you try to focus on just one issue/symptom at a time.
arping was started when the esp was, and left running through several crash cycles. .
Actually at one stage I was seeing about 3 WDT reboots within a few minutes, then a crash. Then yesterday it wasn't doing any WDT reboots, only crashes - probably just to annoy me because I had added a few lines to get/print the rst_info at boot to see if it gave anything useful about the WDT resets - I am keeping an eye on them just in case they're related - could be they are actually just a trashed stack and random jump to an infinite loop...
I wonder if I can work out a way to cleanly dump the stack and registers at regular intervals, say via an interrupt.... maybe copy it to a pre-allocated block on the heap and print it at the next loop().
I'm going to take a break from the problem for a few days and let my brain recover for a bit...
Well taking a snapshot of up to 2k of stack every 500 us seems to work...
I copy it into a set of pre-allocated buffers in a circular list, then dump the buffers in a custom_crash_calback() so I get the last 5 snapshots and the actual stack dump.
Nothing is leaping out at me yet, but I'm seeing some really wierd behaviour - for example at one point it would crash with an illegal instruction near the start of my timer callback, even though it had executed it several thousand times already.
Out of sheer frustration I added a 22uf ceramic cap across the 3.3v line in addition to the 470uf electrolytic on the 5v line. No difference.
Any given build still crashes in the same place/way each time, although the amount of time before it does can vary.
Generally seeing a stack that has main ets_run ets_post .... then a number of calls that would suggest it's dealing with wifi packet TX or RX headers before the stack gets wierd. I think it might be a management frame thats giving it a hard time, but I'm not sure how I'm going to isolate it and test yet.
@d-a-v has an onboard tcpdump utility which acts like an onboard wireshark. You could in addition use wireshark to see all packet traffic, then compare against the onboard dump to see which was the first packet after the dump ends. Then you could try to send such a packet to the ESP to see if that makes it crash.
Unfortunately I don't think it's traffic at the ethernet/ip layer that's doing it - I have already tapped into the lwip2 glue layer and even with no traffic at all (either no packets arriving, or I dropped the packets as soon as the glue got them, without passing them on to lwip) it still crashes. Whatever is happening seems to be inside the Espressif SDK with traffic that is not passed on to the ip stack via ethernet_input() - therefore I'm unlikely to be able to generate it with the usual tools either. I may end up having to use the SoftAP in a second ESP to generate the offending traffic. Something I haven't tried yet is using a really dumb 2.4GHz only AP - I have a WiFi to RS232 device that should fit the bill.
I understand, it's my thinking as well. I'll try to explain better. Set up the ESP with the tcpdump tool. Set up wireshark to record all traffic. Once the ESP crashes, compare the wireshark log against the ESP dump
The idea is that the first packet in the wireshark log that is not in the ESP dump is the crash suspect (the sdk tried to receive it but crashed before handoff to lwip)
Say the ESP dump shows packets 1 through 30. Those same packets should also show in wireshark. Then packet 31 in wireshark is the suspect to cause the ESP to crash.
I'm pretty sure this is an SDK problem, but posting here in case I'm wrong, or it's caused by something in the Arduino library - and I discovered/tested it using the Arduino IDE. I have also been looking on esp8266.com for similar issues.
Basic Infos
Platform
Settings in IDE
Problem Description
Exception 0 with a very long stack dump, sometimes more than 5500 bytes of stack used. Dump contains a large area of stack (>3Kbytes) that has been allocated but never used (still set to feefeffe)
I tested with version 2.4.0 right up to git (1-Aug-2019) with the same results. I also tried several D1 mini Pro boards with the same result.
It can be reliably reproduced by flashing the BareMinimum example once the wifi has been configured to connect to an available network with the default sleep mode (MODEM_SLEEP_T). With debugging on it will crash within seconds of "pm open,type:2"
I decoded many stack dumps, and as far as I can tell the "randomly" allocated stack is happening in the SDK code, probably in a network interrupt handler as generally the trace leaves the Arduino library in esp_yield_within_cont() which calls run_scheduled_recurrent_functions(); - before the big chunk of stack is allocated - and re-enters the Arduino library at ethernet_input() - after the big chunk is allocated. There are several stack frames that don't get decoded, presumably because they happen in the SDK.
The sketch below contains some additional code I have been using to try and diagnose the issue, but as mentioned above, the BareMinimum example will trigger the problem reliably once the wifi is configured. #includes are as per Sketch->Include->ESP8266WIFI.
It may be worth noting that only including ESP8266WiFi.h and no other headers in this sketch results in a crash much sooner for me.
MCVE Sketch
Debug Messages