Open ghost opened 4 years ago
Try your test running with Flash set to DIO instead of QIO.
I have several of these boards and they are only stable in DIO mode, in contrast, the Ai Thinker's specification for the ESP-12-F has a photo showing QIO. Check the back of your module's PCB Antenna area for silkscreen markings, mine is marked with ESP-12-F DIO L4
. Ai Thinker's photo shows ESP-12-F QIO L4
. As far as I can see, Ai Thinker specification's text does not indicate which mode to use.
Thanks for your reply.
Unfortunately, I did the test and all configurations failed : DOUT, DIO, QOUT and QIO. My ESP-12F is marked "ESP-12-F L4". No QIO or DIO.
I'll continue to investigate with WireShark. It also fails in mode WIFI_AP_STA.
Then HTTP captured protocol indeed will hopefully help.
We also have now an internal packet capture facility, check the provided Netdump
library (git version, post 2.7.4).
From which manufactor is the flash chip? I have modules with PUYA chips. Some work, some not. The issues i have with the bad ones are weird. It took me a long time until i recognized everything is okay it is "just" the flash chip Edit: The upload speed you use is way too high for many modules. Better use 115200
This is ESP-12-F bought directly to AI-Thinker. I'll try to upload slower and see what happens.
I'm currently adding the packet capture to the code to feed the discussion.
Still not answered which flash chip is used in the module. From whom build or bought is not a info from interest.
This is a XMC - QH32BHIG - 4MB flash chip.
cc @ChocolateFrogsNuts @Tech-TX
Since you are buying bare AI-Thinker modules (not a DEV board), what do you use for a power source?
On new designs, I like to rule out possible power supply issues, before going crazy with a problem. Power issues can create a lot of different and unexpected symptoms that are difficult to associate with power. Part of the issue with the ESP8266 is it can go from sipping 60ma to gulping 300ma in an instant. That is a 400% load increase. A switch-mode power supply usually doesn't have that kind of Transient Load response. All of the IoT designs that I have seen, pass the switch-mode power supply output through an LDO regulator to feed the ESP8266.
I use a classic LDO regulator to power the module. The source can draw easily up to 1000 mA. I've already checked if a voltage drop can cause this issue but did not find one.
Monday, I'll bypass the LDO and use my laboratory power supply to see if it helps. In the meantime, I'll continue my investigations and keep you informed.
Thanks for the help.
My bet is the XMC flash chip as trouble maker.
Still the same issue with power bypassed by a laboratory power supply. Tested on 10 modules (same batch coming from AI-Thinker). No glitches, all good on power supply.
Now, I'll remove the flash chip on some modules and compare. Any suggestion for a good flash ? Thanks.
Never had issues with winbond flash chips
OK, let's go with that. I'll keep you posted after the change.
In the meantime, I noticed that the problem happens every time on a DHCP incoming call after the HNAP calls. It happens also on all the 10 modules. Whatever the flash mode and frequency.
We will know more after you did swap the flash.
I changed the flash chip but it we still have the same issue. after looking at a lot of crash logs, it always crashes at the same line in esp-dhcpserver.c (line 626).
A newer version of Safari was also released but it did not change this behavior.
Fortunately, no crashes with Chrome on MacOS because it does not implement HNAP.
I try now to make HNAP calls from another "more controlled" source than Safari to try some debugging.
I understand that running standard example : ESP8266WiFi/examples/WiFiAccessPoint/WiFiAccessPoint.ino
and accessing the esp using current macOS's Safari is enough to try and reproduce this issue.
Am I right ?
Correct.
Every time with this configuration. Sometimes it takes several minutes to get the HNAP calls, but they will arrive and crash the module.
To restrict traffic to AP and have packet capture, I added in the example
#include <Netdump.h>
NetCapture::Netdump nd;
...
// in setup():
WiFi.mode(WIFI_AP);
nd.printDump(Serial, NetCapture::Packet::PacketDetail::FULL, [](NetCapture::Packet p) { return true; });
Safari on iPad macOS 12.4.8 did not trigger this bug after several minutes. I will try with an apple laptop.
I try and keep you informed.
No issue found on iOS (iPhone & iPad). Probably a different implementation of Safari. I have a Mac Book Pro (2020) / Catalina (Version 10.156) / Safari (Version 14.0 - 15610.1.28.1.9, 15610).
I'm sorry to not be able to reproduce, using a macmini, updated to latest macOS and safari.
That's 580 seconds after esp boot:
request: GET / HTTP/1.1
method: GET url: / search: keepAlive=: 1
headerName: Host
headerValue: 192.168.4.1
headerName: Upgrade-Insecure-Requests
headerValue: 1
headerName: Accept
headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
headerName: User-Agent
headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15
headerName: Accept-Language
headerValue: fr-fr
headerName: Accept-Encoding
headerValue: gzip, deflate
headerName: Connection
headerValue: keep-alive
Would there be another way to reproduce the HNAP request than by using safari ?
dhcp server is brought to main repository in PR #6680. Would you be able to test with this PR (that I just updated) ? You need first to use the git version of this core. Or you can install the alpha release v0.0.2 with the arduino IDE, that already includes this PR.
Sometimes, it takes about 15 minutes to get the HNAP calls. I guess there is a cache somewhere. The best solution I found to lower this time is to change the AP SSID name.
Unfortunately, I did not find tools that can emulates the HNAP calls.
I'll try this today and will give you an update.
Usually, when I change the SSID name on the module and cancel the WiFi configuration on my Mac, it occurs in less than a minute...
Same issue with the alpha release v0.0.2.
See logs attached. logs.txt
I'll uninstall Avast Antivirus now because it looks like a security analysis of the access point.
Avast disabled. Sounds promising because no HNAP calls during these last 15 minutes... I'll continue my tests to validate this.
Confirmed : HNAP calls are an Avast security analysis of the access point through Safari.
That should not be a reason to blow up the ESP !
Leaving this opened. Once dhcp server is in the main repository (that's #6680) and its code is revisited, one may check this bug again. In the meantime one / I should try to artificially extract and replay the request based on your above logs @Exocet22 .
Sorry !
I continued the tests : the problem does not occur when the module is in STA mode. It occurs only in AP mode.
dhcp server is indeed enabled only in AP mode
Correct ! It's the reason why it does not crash !
Title is renamed, and milestone pushed. Help is requested reproduce HNAP calls.
Platform
Hardware: ESP8266 (original AI-Thinker ESP12-F) Core Version: 2.7.4 Development Env: Arduino IDE 1.8.13 Operating System: Mac OS (Catalina)
Settings in IDE
Problem Description
Hi everybody.
Exception (3) occurs on the standard example every time when Safari makes sudden HNAP calls(Home Network Administration Protocol) on the server. These HTTP calls made by Safari are a little bit hectic and sometimes we need to wait several minutes before receiving them. Chrome doesn't seem to send these kind a HTTP calls.
Thanks in advance for the help.
Sketch
See standard example : ESP8266WiFi/examples/WiFiAccessPoint/WiFiAccessPoint.ino
Debug Messages
17:26:33.740 -> SDK:2.2.1(cfd48f3)/Core:2.7.3-3-g2843a5ac=20703003/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-30-g92add50/BearSSL:5c771be 17:26:33.740 -> 17:26:33.740 -> Configuring access point...bcn 0 17:26:33.852 -> del if1 17:26:33.852 -> usl 17:26:33.852 -> add if1 17:26:33.852 -> dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1) 17:26:33.852 -> bcn 100 17:26:33.852 -> AP IP address: 192.168.4.1 17:26:33.852 -> HTTP server started 17:27:59.594 -> add 1 17:27:59.628 -> aid 1 17:27:59.628 -> station: 3c:22:fb:3f:d0:0f join, AID = 1 17:28:41.619 -> New client 17:28:41.619 -> request: GET /this/is/a/test/ HTTP/1.1 17:28:41.619 -> method: GET url: /this/is/a/test/ search: 17:28:41.619 -> headerName: Host 17:28:41.619 -> headerValue: 192.168.4.1 17:28:41.619 -> headerName: Upgrade-Insecure-Requests 17:28:41.619 -> headerValue: 1 17:28:41.619 -> headerName: Accept 17:28:41.619 -> headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 17:28:41.619 -> headerName: User-Agent 17:28:41.619 -> headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Safari/605.1.15 17:28:41.654 -> headerName: Accept-Language 17:28:41.654 -> headerValue: fr-fr 17:28:41.654 -> headerName: Accept-Encoding 17:28:41.654 -> headerValue: gzip, deflate 17:28:41.654 -> headerName: Connection 17:28:41.654 -> headerValue: keep-alive 17:28:41.654 -> args: 17:28:41.654 -> args count: 0 17:28:41.654 -> args: 17:28:41.654 -> args count: 0 17:28:41.654 -> Request: /this/is/a/test/ 17:28:41.654 -> Arguments: 17:28:41.654 -> final list of key/value pairs: 17:28:41.654 -> request handler not found 17:28:41.724 -> New client 17:28:41.724 -> request: GET /favicon.ico HTTP/1.1 17:28:41.724 -> method: GET url: /favicon.ico search: 17:28:41.757 -> headerName: Host 17:28:41.757 -> headerValue: 192.168.4.1 17:28:41.757 -> headerName: Connection 17:28:41.757 -> headerValue: keep-alive 17:28:41.757 -> headerName: Accept 17:28:41.757 -> headerValue: / 17:28:41.757 -> headerName: User-Agent 17:28:41.757 -> headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Safari/605.1.15 17:28:41.757 -> headerName: Accept-Language 17:28:41.757 -> headerValue: fr-fr 17:28:41.757 -> headerName: Referer 17:28:41.795 -> headerValue: http://192.168.4.1/this/is/a/test/ 17:28:41.795 -> headerName: Accept-Encoding 17:28:41.795 -> headerValue: gzip, deflate 17:28:41.795 -> args: 17:28:41.795 -> args count: 0 17:28:41.795 -> args: 17:28:41.795 -> args count: 0 17:28:41.795 -> Request: /favicon.ico 17:28:41.795 -> Arguments: 17:28:41.795 -> final list of key/value pairs: 17:28:41.795 -> request handler not found 17:28:50.713 -> New client 17:28:50.713 -> request: GET / HTTP/1.1 17:28:50.713 -> method: GET url: / search: 17:28:50.713 -> headerName: Host 17:28:50.713 -> headerValue: 192.168.4.1 17:28:50.713 -> headerName: Accept 17:28:50.713 -> headerValue: / 17:28:50.713 -> headerName: Accept-Encoding 17:28:50.713 -> headerValue: deflate, gzip 17:28:50.713 -> args: 17:28:50.713 -> args count: 0 17:28:50.713 -> args: 17:28:50.713 -> args count: 0 17:28:50.713 -> Request: / 17:28:50.713 -> Arguments: 17:28:50.713 -> final list of key/value pairs: 17:28:50.713 -> New client 17:28:50.747 -> request: GET /HNAP1/ HTTP/1.1 17:28:50.747 -> method: GET url: /HNAP1/ search: 17:28:50.747 -> headerName: Host 17:28:50.747 -> headerValue: 192.168.4.1 17:28:50.747 -> headerName: Accept 17:28:50.747 -> headerValue: / 17:28:50.747 -> headerName: Accept-Encoding 17:28:50.747 -> headerValue: deflate, gzip 17:28:50.747 -> headerName: Connection 17:28:50.747 -> headerValue: Close 17:28:50.747 -> args: 17:28:50.747 -> args count: 0 17:28:50.747 -> args: 17:28:50.747 -> args count: 0 17:28:50.747 -> Request: /HNAP1/ 17:28:50.747 -> Arguments: 17:28:50.747 -> final list of key/value pairs: 17:28:50.747 -> request handler not found 17:29:08.745 -> New client 17:29:08.745 -> request: GET / HTTP/1.1 17:29:08.745 -> method: GET url: / search: 17:29:08.745 -> headerName: Host 17:29:08.745 -> headerValue: 192.168.4.1 17:29:08.745 -> headerName: Accept 17:29:08.780 -> headerValue: / 17:29:08.780 -> headerName: Accept-Encoding 17:29:08.780 -> headerValue: deflate, gzip 17:29:08.780 -> args: 17:29:08.780 -> args count: 0 17:29:08.780 -> args: 17:29:08.780 -> args count: 0 17:29:08.780 -> Request: / 17:29:08.780 -> Arguments: 17:29:08.780 -> final list of key/value pairs: 17:29:08.780 -> New client 17:29:08.780 -> request: GET /HNAP1/ HTTP/1.1 17:29:08.780 -> method: GET url: /HNAP1/ search: 17:29:08.780 -> headerName: Host 17:29:08.780 -> headerValue: 192.168.4.1 17:29:08.780 -> headerName: Accept 17:29:08.780 -> headerValue: / 17:29:08.780 -> headerName: Accept-Encoding 17:29:08.813 -> headerValue: deflate, gzip 17:29:08.813 -> headerName: Connection 17:29:08.813 -> headerValue: Close 17:29:08.813 -> args: 17:29:08.813 -> args count: 0 17:29:08.813 -> args: 17:29:08.813 -> args count: 0 17:29:08.813 -> Request: /HNAP1/ 17:29:08.813 -> Arguments: 17:29:08.813 -> final list of key/value pairs: 17:29:08.813 -> request handler not found 17:29:10.932 -> Fatal exception 3(LoadStoreErrorCause): 17:29:10.932 -> epc1=0x4000deea, epc2=0x00000000, epc3=0x00000000, excvaddr=0x4023ef68, depc=0x00000000 17:29:10.932 -> 17:29:10.932 -> --------------- CUT HERE FOR EXCEPTION DECODER --------------- 17:29:12.197 -> --------------- CUT HERE FOR EXCEPTION DECODER --------------- 17:29:12.197 -> 17:29:12.197 -> ets Jan 8 2013,rst cause:2, boot mode:(3,0) 17:29:12.197 -> 17:29:12.231 -> load 0x4010f000, len 3584, room 16 17:29:12.231 -> tail 0 17:29:12.231 -> chksum 0xb0 17:29:12.231 -> csum 0xb0 17:29:12.231 -> v2843a5ac 17:29:12.231 -> ~ld 17:29:13.269 ->
Exception Decoder
Exception 3: LoadStoreError: Processor internal physical address or data error during load or store PC: 0x4000deea EXCVADDR: 0x4023ef68
Decoding stack results 0x4010028c: calloc(size_t, size_t) at /Users/admin/Library/Arduino15/packages/esp8266/hardware/esp8266/2.7.4/cores/esp8266/heap.cpp line 242 0x402300f0: handle_dhcp at glue-lwip/esp-dhcpserver.c line 626 0x402300b0: handle_dhcp at glue-lwip/esp-dhcpserver.c line 679 0x401008cd: check_poison_neighbors(uint16_t) at /Users/admin/Library/Arduino15/packages/esp8266/hardware/esp8266/2.7.4/cores/esp8266/umm_malloc/umm_local.c line 71 0x40100989: umm_malloc_core(size_t) at /Users/admin/Library/Arduino15/packages/esp8266/hardware/esp8266/2.7.4/cores/esp8266/umm_malloc/umm_malloc.cpp line 458 0x40213c40: udp_input at core/udp.c line 404 0x4022f7cc: pbuf_alloc at glue-esp/lwip-esp.c line 669 0x40218a70: ip4_input at core/ipv4/ip4.c line 1461 ...