esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.09k stars 13.32k forks source link

HNAP calls on dhcp-server in AP mode causes an exception #7601

Open ghost opened 4 years ago

ghost commented 4 years ago

Platform

Hardware: ESP8266 (original AI-Thinker ESP12-F) Core Version: 2.7.4 Development Env: Arduino IDE 1.8.13 Operating System: Mac OS (Catalina)

Settings in IDE

Problem Description

Hi everybody.

Exception (3) occurs on the standard example every time when Safari makes sudden HNAP calls(Home Network Administration Protocol) on the server. These HTTP calls made by Safari are a little bit hectic and sometimes we need to wait several minutes before receiving them. Chrome doesn't seem to send these kind a HTTP calls.

Thanks in advance for the help.

Sketch

See standard example : ESP8266WiFi/examples/WiFiAccessPoint/WiFiAccessPoint.ino

Debug Messages

17:26:33.740 -> SDK:2.2.1(cfd48f3)/Core:2.7.3-3-g2843a5ac=20703003/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-30-g92add50/BearSSL:5c771be 17:26:33.740 -> 17:26:33.740 -> Configuring access point...bcn 0 17:26:33.852 -> del if1 17:26:33.852 -> usl 17:26:33.852 -> add if1 17:26:33.852 -> dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1) 17:26:33.852 -> bcn 100 17:26:33.852 -> AP IP address: 192.168.4.1 17:26:33.852 -> HTTP server started 17:27:59.594 -> add 1 17:27:59.628 -> aid 1 17:27:59.628 -> station: 3c:22:fb:3f:d0:0f join, AID = 1 17:28:41.619 -> New client 17:28:41.619 -> request: GET /this/is/a/test/ HTTP/1.1 17:28:41.619 -> method: GET url: /this/is/a/test/ search: 17:28:41.619 -> headerName: Host 17:28:41.619 -> headerValue: 192.168.4.1 17:28:41.619 -> headerName: Upgrade-Insecure-Requests 17:28:41.619 -> headerValue: 1 17:28:41.619 -> headerName: Accept 17:28:41.619 -> headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 17:28:41.619 -> headerName: User-Agent 17:28:41.619 -> headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Safari/605.1.15 17:28:41.654 -> headerName: Accept-Language 17:28:41.654 -> headerValue: fr-fr 17:28:41.654 -> headerName: Accept-Encoding 17:28:41.654 -> headerValue: gzip, deflate 17:28:41.654 -> headerName: Connection 17:28:41.654 -> headerValue: keep-alive 17:28:41.654 -> args: 17:28:41.654 -> args count: 0 17:28:41.654 -> args: 17:28:41.654 -> args count: 0 17:28:41.654 -> Request: /this/is/a/test/ 17:28:41.654 -> Arguments: 17:28:41.654 -> final list of key/value pairs: 17:28:41.654 -> request handler not found 17:28:41.724 -> New client 17:28:41.724 -> request: GET /favicon.ico HTTP/1.1 17:28:41.724 -> method: GET url: /favicon.ico search: 17:28:41.757 -> headerName: Host 17:28:41.757 -> headerValue: 192.168.4.1 17:28:41.757 -> headerName: Connection 17:28:41.757 -> headerValue: keep-alive 17:28:41.757 -> headerName: Accept 17:28:41.757 -> headerValue: / 17:28:41.757 -> headerName: User-Agent 17:28:41.757 -> headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Safari/605.1.15 17:28:41.757 -> headerName: Accept-Language 17:28:41.757 -> headerValue: fr-fr 17:28:41.757 -> headerName: Referer 17:28:41.795 -> headerValue: http://192.168.4.1/this/is/a/test/ 17:28:41.795 -> headerName: Accept-Encoding 17:28:41.795 -> headerValue: gzip, deflate 17:28:41.795 -> args: 17:28:41.795 -> args count: 0 17:28:41.795 -> args: 17:28:41.795 -> args count: 0 17:28:41.795 -> Request: /favicon.ico 17:28:41.795 -> Arguments: 17:28:41.795 -> final list of key/value pairs: 17:28:41.795 -> request handler not found 17:28:50.713 -> New client 17:28:50.713 -> request: GET / HTTP/1.1 17:28:50.713 -> method: GET url: / search: 17:28:50.713 -> headerName: Host 17:28:50.713 -> headerValue: 192.168.4.1 17:28:50.713 -> headerName: Accept 17:28:50.713 -> headerValue: / 17:28:50.713 -> headerName: Accept-Encoding 17:28:50.713 -> headerValue: deflate, gzip 17:28:50.713 -> args: 17:28:50.713 -> args count: 0 17:28:50.713 -> args: 17:28:50.713 -> args count: 0 17:28:50.713 -> Request: / 17:28:50.713 -> Arguments: 17:28:50.713 -> final list of key/value pairs: 17:28:50.713 -> New client 17:28:50.747 -> request: GET /HNAP1/ HTTP/1.1 17:28:50.747 -> method: GET url: /HNAP1/ search: 17:28:50.747 -> headerName: Host 17:28:50.747 -> headerValue: 192.168.4.1 17:28:50.747 -> headerName: Accept 17:28:50.747 -> headerValue: / 17:28:50.747 -> headerName: Accept-Encoding 17:28:50.747 -> headerValue: deflate, gzip 17:28:50.747 -> headerName: Connection 17:28:50.747 -> headerValue: Close 17:28:50.747 -> args: 17:28:50.747 -> args count: 0 17:28:50.747 -> args: 17:28:50.747 -> args count: 0 17:28:50.747 -> Request: /HNAP1/ 17:28:50.747 -> Arguments: 17:28:50.747 -> final list of key/value pairs: 17:28:50.747 -> request handler not found 17:29:08.745 -> New client 17:29:08.745 -> request: GET / HTTP/1.1 17:29:08.745 -> method: GET url: / search: 17:29:08.745 -> headerName: Host 17:29:08.745 -> headerValue: 192.168.4.1 17:29:08.745 -> headerName: Accept 17:29:08.780 -> headerValue: / 17:29:08.780 -> headerName: Accept-Encoding 17:29:08.780 -> headerValue: deflate, gzip 17:29:08.780 -> args: 17:29:08.780 -> args count: 0 17:29:08.780 -> args: 17:29:08.780 -> args count: 0 17:29:08.780 -> Request: / 17:29:08.780 -> Arguments: 17:29:08.780 -> final list of key/value pairs: 17:29:08.780 -> New client 17:29:08.780 -> request: GET /HNAP1/ HTTP/1.1 17:29:08.780 -> method: GET url: /HNAP1/ search: 17:29:08.780 -> headerName: Host 17:29:08.780 -> headerValue: 192.168.4.1 17:29:08.780 -> headerName: Accept 17:29:08.780 -> headerValue: / 17:29:08.780 -> headerName: Accept-Encoding 17:29:08.813 -> headerValue: deflate, gzip 17:29:08.813 -> headerName: Connection 17:29:08.813 -> headerValue: Close 17:29:08.813 -> args: 17:29:08.813 -> args count: 0 17:29:08.813 -> args: 17:29:08.813 -> args count: 0 17:29:08.813 -> Request: /HNAP1/ 17:29:08.813 -> Arguments: 17:29:08.813 -> final list of key/value pairs: 17:29:08.813 -> request handler not found 17:29:10.932 -> Fatal exception 3(LoadStoreErrorCause): 17:29:10.932 -> epc1=0x4000deea, epc2=0x00000000, epc3=0x00000000, excvaddr=0x4023ef68, depc=0x00000000 17:29:10.932 -> 17:29:10.932 -> --------------- CUT HERE FOR EXCEPTION DECODER --------------- 17:29:12.197 -> --------------- CUT HERE FOR EXCEPTION DECODER --------------- 17:29:12.197 -> 17:29:12.197 -> ets Jan 8 2013,rst cause:2, boot mode:(3,0) 17:29:12.197 -> 17:29:12.231 -> load 0x4010f000, len 3584, room 16 17:29:12.231 -> tail 0 17:29:12.231 -> chksum 0xb0 17:29:12.231 -> csum 0xb0 17:29:12.231 -> v2843a5ac 17:29:12.231 -> ~ld 17:29:13.269 ->

Exception Decoder

Exception 3: LoadStoreError: Processor internal physical address or data error during load or store PC: 0x4000deea EXCVADDR: 0x4023ef68

Decoding stack results 0x4010028c: calloc(size_t, size_t) at /Users/admin/Library/Arduino15/packages/esp8266/hardware/esp8266/2.7.4/cores/esp8266/heap.cpp line 242 0x402300f0: handle_dhcp at glue-lwip/esp-dhcpserver.c line 626 0x402300b0: handle_dhcp at glue-lwip/esp-dhcpserver.c line 679 0x401008cd: check_poison_neighbors(uint16_t) at /Users/admin/Library/Arduino15/packages/esp8266/hardware/esp8266/2.7.4/cores/esp8266/umm_malloc/umm_local.c line 71 0x40100989: umm_malloc_core(size_t) at /Users/admin/Library/Arduino15/packages/esp8266/hardware/esp8266/2.7.4/cores/esp8266/umm_malloc/umm_malloc.cpp line 458 0x40213c40: udp_input at core/udp.c line 404 0x4022f7cc: pbuf_alloc at glue-esp/lwip-esp.c line 669 0x40218a70: ip4_input at core/ipv4/ip4.c line 1461 ...

mhightower83 commented 4 years ago

Try your test running with Flash set to DIO instead of QIO.

I have several of these boards and they are only stable in DIO mode, in contrast, the Ai Thinker's specification for the ESP-12-F has a photo showing QIO. Check the back of your module's PCB Antenna area for silkscreen markings, mine is marked with ESP-12-F DIO L4. Ai Thinker's photo shows ESP-12-F QIO L4. As far as I can see, Ai Thinker specification's text does not indicate which mode to use.

ghost commented 4 years ago

Thanks for your reply.

Unfortunately, I did the test and all configurations failed : DOUT, DIO, QOUT and QIO. My ESP-12F is marked "ESP-12-F L4". No QIO or DIO.

I'll continue to investigate with WireShark. It also fails in mode WIFI_AP_STA.

d-a-v commented 4 years ago

Then HTTP captured protocol indeed will hopefully help.

We also have now an internal packet capture facility, check the provided Netdump library (git version, post 2.7.4).

Jason2866 commented 4 years ago

From which manufactor is the flash chip? I have modules with PUYA chips. Some work, some not. The issues i have with the bad ones are weird. It took me a long time until i recognized everything is okay it is "just" the flash chip Edit: The upload speed you use is way too high for many modules. Better use 115200

ghost commented 4 years ago

This is ESP-12-F bought directly to AI-Thinker. I'll try to upload slower and see what happens.

I'm currently adding the packet capture to the code to feed the discussion.

ghost commented 4 years ago

Uploading at lower speed did not affect the result. See logs attached.

Logs_HNAP.txt

Thanks.

Jason2866 commented 4 years ago

Still not answered which flash chip is used in the module. From whom build or bought is not a info from interest.

ghost commented 4 years ago

This is a XMC - QH32BHIG - 4MB flash chip.

devyte commented 4 years ago

cc @ChocolateFrogsNuts @Tech-TX

mhightower83 commented 4 years ago

Since you are buying bare AI-Thinker modules (not a DEV board), what do you use for a power source?

On new designs, I like to rule out possible power supply issues, before going crazy with a problem. Power issues can create a lot of different and unexpected symptoms that are difficult to associate with power. Part of the issue with the ESP8266 is it can go from sipping 60ma to gulping 300ma in an instant. That is a 400% load increase. A switch-mode power supply usually doesn't have that kind of Transient Load response. All of the IoT designs that I have seen, pass the switch-mode power supply output through an LDO regulator to feed the ESP8266.

ghost commented 4 years ago

I use a classic LDO regulator to power the module. The source can draw easily up to 1000 mA. I've already checked if a voltage drop can cause this issue but did not find one.

Monday, I'll bypass the LDO and use my laboratory power supply to see if it helps. In the meantime, I'll continue my investigations and keep you informed.

Thanks for the help.

Jason2866 commented 4 years ago

My bet is the XMC flash chip as trouble maker.

ghost commented 4 years ago

Still the same issue with power bypassed by a laboratory power supply. Tested on 10 modules (same batch coming from AI-Thinker). No glitches, all good on power supply.

Now, I'll remove the flash chip on some modules and compare. Any suggestion for a good flash ? Thanks.

Jason2866 commented 4 years ago

Never had issues with winbond flash chips

ghost commented 4 years ago

OK, let's go with that. I'll keep you posted after the change.

In the meantime, I noticed that the problem happens every time on a DHCP incoming call after the HNAP calls. It happens also on all the 10 modules. Whatever the flash mode and frequency.

Jason2866 commented 4 years ago

We will know more after you did swap the flash.

ghost commented 4 years ago

I changed the flash chip but it we still have the same issue. after looking at a lot of crash logs, it always crashes at the same line in esp-dhcpserver.c (line 626).

A newer version of Safari was also released but it did not change this behavior.

Fortunately, no crashes with Chrome on MacOS because it does not implement HNAP.

I try now to make HNAP calls from another "more controlled" source than Safari to try some debugging.

d-a-v commented 4 years ago

I understand that running standard example : ESP8266WiFi/examples/WiFiAccessPoint/WiFiAccessPoint.ino and accessing the esp using current macOS's Safari is enough to try and reproduce this issue. Am I right ?

ghost commented 4 years ago

Correct.

Every time with this configuration. Sometimes it takes several minutes to get the HNAP calls, but they will arrive and crash the module.

d-a-v commented 4 years ago

To restrict traffic to AP and have packet capture, I added in the example

#include <Netdump.h>
NetCapture::Netdump nd;
...
  // in setup():
  WiFi.mode(WIFI_AP);
  nd.printDump(Serial, NetCapture::Packet::PacketDetail::FULL, [](NetCapture::Packet p) { return true; });

Safari on iPad macOS 12.4.8 did not trigger this bug after several minutes. I will try with an apple laptop.

ghost commented 4 years ago

I try and keep you informed.

No issue found on iOS (iPhone & iPad). Probably a different implementation of Safari. I have a Mac Book Pro (2020) / Catalina (Version 10.156) / Safari (Version 14.0 - 15610.1.28.1.9, 15610).

ghost commented 4 years ago

Same crash... See logs attached.

logs.txt

d-a-v commented 4 years ago

I'm sorry to not be able to reproduce, using a macmini, updated to latest macOS and safari.

That's 580 seconds after esp boot:

request: GET / HTTP/1.1
method: GET url: / search:  keepAlive=: 1
headerName: Host
headerValue: 192.168.4.1
headerName: Upgrade-Insecure-Requests
headerValue: 1
headerName: Accept
headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
headerName: User-Agent
headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15
headerName: Accept-Language
headerValue: fr-fr
headerName: Accept-Encoding
headerValue: gzip, deflate
headerName: Connection
headerValue: keep-alive

Would there be another way to reproduce the HNAP request than by using safari ?

d-a-v commented 4 years ago

dhcp server is brought to main repository in PR #6680. Would you be able to test with this PR (that I just updated) ? You need first to use the git version of this core. Or you can install the alpha release v0.0.2 with the arduino IDE, that already includes this PR.

ghost commented 4 years ago

Sometimes, it takes about 15 minutes to get the HNAP calls. I guess there is a cache somewhere. The best solution I found to lower this time is to change the AP SSID name.

Unfortunately, I did not find tools that can emulates the HNAP calls.

I'll try this today and will give you an update.

ghost commented 4 years ago

Usually, when I change the SSID name on the module and cancel the WiFi configuration on my Mac, it occurs in less than a minute...

ghost commented 4 years ago

Same issue with the alpha release v0.0.2.

See logs attached. logs.txt

I'll uninstall Avast Antivirus now because it looks like a security analysis of the access point.

ghost commented 4 years ago

Avast disabled. Sounds promising because no HNAP calls during these last 15 minutes... I'll continue my tests to validate this.

ghost commented 4 years ago

Confirmed : HNAP calls are an Avast security analysis of the access point through Safari.

d-a-v commented 4 years ago

That should not be a reason to blow up the ESP !

Leaving this opened. Once dhcp server is in the main repository (that's #6680) and its code is revisited, one may check this bug again. In the meantime one / I should try to artificially extract and replay the request based on your above logs @Exocet22 .

ghost commented 4 years ago

Sorry !

ghost commented 4 years ago

I continued the tests : the problem does not occur when the module is in STA mode. It occurs only in AP mode.

d-a-v commented 4 years ago

dhcp server is indeed enabled only in AP mode

ghost commented 4 years ago

Correct ! It's the reason why it does not crash !

d-a-v commented 3 years ago

Title is renamed, and milestone pushed. Help is requested reproduce HNAP calls.