espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.35k stars 7.21k forks source link

Fail to get IP from a Arris Router, no GOP_IP event after STA_CONNECTED (IDFGH-2778) #4843

Closed devTW001 closed 4 years ago

devTW001 commented 4 years ago

INSTRUCTIONS

Before submitting a new issue, please follow the checklist and try to find the answer.

If the issue cannot be solved after the steps before, please follow these instructions so we can get the needed information to help you in a quick and effective fashion.

  1. Fill in all the fields under Environment marked with [ ] by picking the correct option for you in each case and deleting the others.
  2. Describe your problem.
  3. Include debug logs from the "monitor" tool, or coredumps.
  4. Providing as much information as possible under Other items if possible will help us locate and fix the problem.
  5. Use Markdown (see formatting buttons above) and the Preview tab to check what the issue will look like.
  6. Delete these instructions from the above to the below marker lines before submitting this issue.

IMPORTANT: If you do not follow these instructions and provide the necessary details, your issue may not be resolved.

----------------------------- Delete above -----------------------------

Environment

Problem Description

We are building smart garage door opener with ESP-WROOM-32 module. Everything went well. But recently, with the increasing number of devices that we sold, more and more reports came from our customers that their devices can not connect with their router. (All the Routers in the reports are provided by Arris, for example Arris DG2460 and Arris TG3452A/CG).

We came to some of the customer's house and did some basic investigation. We found:

  1. The ESP32 failed to get an IP address from the DHCP server of Arris Router.
  2. ESP32 can connect with the router(STA_CONNECTED event triggered), but failed to get IP address from the router( The GOP_IP event never get triggered).
  3. If we set the router to OPEN(not encrypted), ESP32 can get IP and work properly.
  4. If we set the router to any of the encrypted mode(for example WPA2), ESP32 CAN NOT get IP from the Arris Router.
  5. For most routers from other vendor, ESP32 can work properly.

Expected Behavior

Connect with Arris Router and IP assigned.

Actual Behavior

STA connect with Arris Router but fail to get GOT_IP event before 'TIMEOUT'.

Steps to reproduce

Just try connecting ESP32 to a Arris Router

// If possible, attach a picture of your setup/wiring here.

Code to reproduce this issue

BluFi example in IDF. Or sta connect example in IDF.

// If your code is longer than 30 lines, [GIST](https://gist.github.com) is preferred.

## Debug Logs

BLUFI_EXAMPLE: BLUFI VERSION 0102 [0;32mI (899) BLUFI_EXAMPLE: BLUFI init finish [0;32mI (225289) BLUFI_EXAMPLE: BLUFI ble connect [0;32mI (227449) BLUFI_EXAMPLE: BLUFI get wifi status from AP [0;32mI (243919) BLUFI_EXAMPLE: BLUFI ble disconnect [0;32mI (259399) BLUFI_EXAMPLE: BLUFI ble connect [0;32mI (261439) BLUFI_EXAMPLE: BLUFI get wifi status from AP [0;32mI (275149) BLUFI_EXAMPLE: BLUFI Set WIFI opmode 1 I (275209) wifi: clear blacklist [0;32mI (275209) BLUFI_EXAMPLE: Recv STA SSID RCMP Surveillance Van I (275269) wifi: clear blacklist [0;32mI (275269) BLUFI_EXAMPLE: Recv STA PASSWORD Antonio1948 [0;32mI (275329) BLUFI_EXAMPLE: BLUFI requset wifi connect to AP I (275329) wifi: Start wifi disconnect I (275329) wifi: connect status 0 -> 0 I (275329) wifi: filter: set rx policy=8 I (275329) wifi: Start wifi connect I (275339) wifi: connect status 0 -> 0 I (275339) wifi: connect chan=0 I (275339) wifi: nvs=0, ssid=RCMP Surveillance Van, channel=255 I (275349) wifi: ssid=RCMP Surveillance Van match nvs 0, channel=255 I (275349) wifi: first chan=1 I (275359) wifi: connect status 0 -> 1 I (275359) wifi: filter: set rx policy=3 I (275359) wifi: clear scan ap list I (275369) wifi: start scan: type=0x50f, priority=2, cb=0x400eb8cc, arg=0x0, ss_state=0x1, time=274819360, index=0 ......... I (277069) wifi: scan_done: arg=0x0, status=0, cur_time=276521021, scan_id=128, scan state=0 I (277079) wifi: call scan_done cb, arg=0x0 I (277079) wifi: handoff_cb: status=0 I (277089) wifi: best bss has set. I (277089) wifi: ap found, mac=88:71:b1:a5:58:a1 I (277089) wifi: bssid=88:71:b1:a5:58:a1, LR=0 I (277099) wifi: new_bss=0x3ffc6c28, cur_bss=0x0, new_chan=<11,0>, cur_chan=1 I (277099) wifi: filter: set rx policy=5 I (277109) wifi: new:<11,0>, old:<1,0>, ap:<255,255>, sta:<11,0>, prof:1 I (277109) wifi: ht20 freq=2462, chan=11 I (277119) wifi: connect_op: status=0, auth=5, cipher=3 I (277119) wifi: auth mode is not none I 278129) wifi: connect_bss: auth=1, reconnect=0 I (278129) wifi: state: init -> auth (b0) I (278129) wifi: start 1s AUTH timer I (278129) wifi: clear scan ap list I (278139) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278139) wifi: set max rate: from to I (278149) wifi: sig_b=1, sig_g=0, sig_n=0, max_b=0, max_g=108, max_n=144 I (278149) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278159) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278159) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278169) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278169) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278179) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278179) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278189) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278189) wifi: recv auth: seq=2, status=0 I (278199) wifi: state: auth -> assoc (0) I (278199) wifi: restart connect 1s timer for assoc I (278209) wifi: recv auth: seq=2, status=0 I (278209) wifi: not auth state, ignore I (278209) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278239) wifi: recv assoc: type=0x10 I (278239) wifi: filter: set rx policy=6 I (278239) wifi: state: assoc -> run (10) I (278239) wifi: start 10s connect timer for 4 way handshake I (278259) wifi: wpa_psk start I (278269) wifi: wpa_psk handle succeed I (278269) wifi: sta recv dup seq=32832 tid=16, discard I (278269) wifi: wpa_psk start I (278269) wifi: wpa_psk handle succeed I (278289) wifi: connected with RCMP Surveillance Van, aid = 6, channel 11, BW20, bssid = 88:71:b1:a5:58:a1 I (278289) wifi: security type: 3, phy: bgn, rssi: -89 I (278299) wifi: remove all except 88:71:b1:a5:58:a1 from rc list I (278299) wifi: clear blacklist I (278309) wifi: filter: set rx policy=7 I (278309) wifi: pm start, type: 1 I (278309) wifi: Send sta connected event I (278319) wifi: connect status 1 -> 5 I (278319) wifi: obss scan is disabled I (278319) wifi: start obss scan: obss scan is stopped I (278399) wifi: AP's beacon interval = 102400 us, DTIM period = 2 I (278399) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278499) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278599) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278699) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278799) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (278909) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (279009) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (279109) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 I (279209) wifi: rsn valid: gcipher=3 ucipher=3 akm=5 ...... Repeating log like above, no GOP_IP event and no IP assigned.

negativekelvin commented 4 years ago

LWIP_DHCP_DOES_ARP_CHECK ?

devTW001 commented 4 years ago

@negativekelvin This one is enabled by default. We didn't change it.

│ Enabling this option performs a check (via ARP request) if the offered IP address is not already in use by another host on the network.

Do you mean we should disable this ?

devTW001 commented 4 years ago

More information: everything else on the router works perfectly(smart phones, laptops, and other smart devices. This only happens to ESP32.)

negativekelvin commented 4 years ago

Worth trying although I don't know why it would be correlated to security type. You may also want to try newer branches to see if they have fixed the problem.

HarveyRong-Esp commented 4 years ago

Hi @devTW001 . Confirm that all models of Arris routers cannot connect under any encryption method? Can you provide the router configuration (encryption method, model, etc). If you know how to capture WiFi packets, it is also very helpful for this problem.

devTW001 commented 4 years ago

@esp-HarveyRong Not sure whether it would occur on all of Arris routers, the two models that we tested are Arris DG2460 and Arris TG3452A. Security type for 2.4Ghz is WPA2-PSK(AES). (Any other type won't work either.) Channel is AUTO. Bandwidth is 20M.MAC filtering Mode: Allow-all. Which level do you want me to capture the packets ? (802.11 or IP ?) I only have Wireshark here. It would be great if you can tell how you want me to capture the packets.

HarveyRong-Esp commented 4 years ago

@devTW001 Need to capture decrypted 802.11 packets. The recommended Macbook's built-in packet capture tool does not require a packet capture card. It can be used to capture 802.11a/b/g/n/ac packets: https://osxdaily.com/2015/04/23/sniff-packet-capture-packet-trace-mac-os-x-wireless-diagnostics/ Maybe the most effective way is for you to ship the router to us, and then we debug it in the office.

bgintz-rsc commented 4 years ago

We are also experiencing failure to receive an IP address on an Arris TG862G/CT Routers supplied by Comcast / Xfinity

Details:

@devTW001: Can you confirm whether any of your customers experiencing the issue have the Arris TG862G/CT model?

Due to the COVID-19 situation, we are currently unable to go onsite to capture a log and confirm that the behavior is the same (ie. STA connected but no IP address received) however the symptoms are the same. The failure occurs during provisioning and the process will hang only if the STA connects but does not receive an IP address. Failure to associate (i.e. connect) would not cause a hang.

bgintz-rsc commented 4 years ago

@devTW001

Also, have you asked tried asking customers to request their ISP's to update the Arris devices (in case this is a known issue with the routers)?

bgintz-rsc commented 4 years ago

@devTW001, @negativekelvin, @esp-HarveyRong

Any updates? When our customer reported the issue the ISP replaced their router

Scott--R commented 4 years ago

So far we have been able to confirm this issue exists on the following models:

Arris DG2460 Arris DG3450 Arris TG3452A/CG

We have also been able to confirm that it does NOT occur on the new Arris Surfboards. The problem is that the ISPs in North America are not installing the newer Surfboards, they're installing the DG and TG series.

The problem is further compounded by the fact that we have been having a heck of a time getting our hands on any of these. Finally found one on eBay and bought it immediately. Will have it in a week and will finally be able to start in-house testing on it.

If we can't resolve it in a day or two we will pack it up and send it to Espressif for debugging.

Arris tech support is of no help because these models are "no longer supported" - yet the ISPs are installing them daily. It seems the ISPs bought them all up.

bgintz-rsc commented 4 years ago

Thanks for the additional info @Scott--R . We are also debugging and will share what we find.

devTW001 commented 4 years ago

@esp-HarveyRong We tried to capture the packets in the attached file. Woud you please take a look ?

Arris_fail2.pcapng.zip

HarveyRong-Esp commented 4 years ago

Hi @devTW001, The data packets in the attachment are all broadcast packets, not enough to analyze the problem. Can you provide 802.11 packets for the entire connection process of STA and AP? Including scanning, authentication, association and four-way handshake, etc. BTW, if you can ship us the router, it will be very helpful. Ship it to us, please send an email to sales@espressif.com, and briefly explain the problem background. (The most important thing is to attach the this issue link) Thanks.

Scott--R commented 4 years ago

@esp-HarveyRong the WireShark report shared by @devTW001 were captured by me and that is essentially the problem. Nothing else seems to happen. I did grab some screenshots from WireShark that showed some issues with malformed packets as well as some kind of frame problem. I have attached those screenshots.

I can arrange to ship you the router but it's going to cost over $400. Plus we only have 1 of them, so if I ship it to you we lose all ability to do any testing and we must then rely entirely on you to debug and fix it - and we have no idea how it will be prioritized at Espressif.

In short, if we ship it to you we need some comfort that you will tackle it and fix the issue.

Screen Shot 2020-04-27 at 7 34 01 PM Screen Shot 2020-04-27 at 7 35 01 PM
HarveyRong-Esp commented 4 years ago

Hi @Scott--R, Can you try to test with b6599abb version to see if the issue still occurs?

Scott--R commented 4 years ago

@esp-HarveyRong @bgintz-rsc we have confirmed that the b6599ab version fixes the issue!!!

negativekelvin commented 4 years ago

What was the fix, @esp-HarveyRong ? Are these routers doing something non-compliant or was this a bug? Is the fix in 4.1 and master branches yet?

bgintz-rsc commented 4 years ago

Has this fix been merged into the AFR version of the ESP-IDF repo which is located at: https://github.com/espressif/esp-afr-sdk

The reason for asking is that the the lib submodule (under components) in the above repo points to a detached HEAD at commit d8ed359 instead of v3.3 which includes the update to the wifi lib files. If we build the current master in the amazon-freertos repo it appears that this fix will not be included.

We are unable to build at the desired commit with the fix as cmake performs a repo refresh which overrides our selected commit.

Thank you.

mahavirj commented 4 years ago

@bgintz-rsc We will prioritize on getting this fix part of esp-afr-sdk. We shall keep you posted on this. (CC @shubhamkulkarni97)

shubhamkulkarni97 commented 4 years ago

Hi @bgintz-rsc,

We have added this fix in esp-afr-sdk. Can you apply the patch attached below, update esp-idf submodule and check if your issue is resolved?

Please run git am update_esp-idf_submodule.patch to apply the patch. Run git submodule update --init --recursive to recursively update submodules.

update_esp-idf_submodule.patch.zip

AxelLin commented 4 years ago

What was the fix, @esp-HarveyRong ? Are these routers doing something non-compliant or was this a bug? Is the fix in 4.1 and master branches yet?

Hi @mahavirj @shubhamkulkarni97 Can someone help to clarify above questions?

mahavirj commented 4 years ago

@negativekelvin @AxelLin I can confirm that fix is present in release v4.1 and on-wards branches. This is essentially backport activity for previous release branches. For technical details, I will let @liuzfesp , @esp-HarveyRong to comment.

HarveyRong-Esp commented 4 years ago

Hi @AxelLin @negativekelvin @mahavirj, This issue is a bug of the router, we carried out a workaround method for compatibility with the bug. Currently only v3.1 does not have backport, other versions have been fixed

bgintz-rsc commented 4 years ago

Hello @shubhamkulkarni97

Apologies. I just now saw this. (Having issues getting GitHub update on issues ;). I will take a look ASAP

bgintz-rsc commented 4 years ago

@mahavirj , @shubhamkulkarni97

UPDATE:

I tried to test this fix two ways:

1. Apply the provided patch:

I did try to apply the patch unsuccessfully. Our current tree structure is based on afr release 202002.00. This was prior to when the esp-idf was included as a submodule. The patch fails with the message:

Applying: Update submodule pointer for ESP-IDF error: vendors/espressif/esp-idf: does not exist in index

Since the patch adjusts the submodule pointer it will not work for us.

2. Copied modified files

I also tried to simply copy the files included in commit 0274e09 of the esp submodule into the corresponding folder in our build tree.

image

The build fails with the following error:

A fatal error occurred: Invalid segment count 18 (max 16). Usually this indicates a linker script problem.

So there are apparently additional dependencies.

We cannot update to the latest version of AFR to test this as there are a number of breaking changes in the networking and BLE code that we will need to resolve conflicts with.

Is there an alternative way you can provide a fix that is compatible with the 20200200 AFR release?

Thank you in advance.

Best Bobby

shubhamkulkarni97 commented 4 years ago

@bgintz-rsc ,

I have pushed a branch, which includes a commit on top of 202002.00 AFR release, which should fix your issue.

Can you please try running application on the same branch?

bgintz-rsc commented 4 years ago

Thank you. I will try it @shubhamkulkarni97

bgintz-rsc commented 4 years ago

Hello @shubhamkulkarni97

Sorry for the long delay. We sent the code using your suggested branch above to our test lab and it still failed. The Wireshark capture is attached below. Here is an excerpt from the report we received indicated the failure: Arris TG862G.pcap.zip

The ESP32 sent a Probe Request in frame #1775. The Arris router responds with a Probe Response in frames #1782 and #1784. After the Probe Response, the Scooper does not send an Authentication frame or an Association Request frame to progress the WiFi association process. Therefore, the ESP32 never attempts to associate to the Arris TG862G WiFi router. The SSID parameter within the Probe Request sent by the ESP32 is empty (Wildcard SSID) which allows any WiFi router to respond to the Probe Request.

Please advise.

shubhamkulkarni97 commented 4 years ago

@bgintz-rsc,

We also have some follow up questions, can you ping me on shubham dot kulkarni at espressif dot com?

This will help better communication.

HarveyRong-Esp commented 4 years ago

Hi @bgintz-rsc,

I looked at the packet capture you provided. I think this is not a WiFi packet in the connection phase, but WiFi packet in the scanning phase, so the Auth frame will not be sent after receiving the Probe Response. Generally, Probe Request will carry SSID when connecting or scanning a specific AP.

Can you help provide complete WiFi packets and console logs (including startup logs)? Including scanning, association, authentication, etc.

The issue to be clarified is whether the device cannot connect to the route or the connected route cannot obtain an IP.

BTW, I agree with @shubhamkulkarni97 's suggestion, can you try to run the work station example on the release/v3.3 branch of ESP-IDF and check if it can be connected to the router?

mahavirj commented 4 years ago

As confirmed earlier https://github.com/espressif/esp-idf/issues/4843#issuecomment-621987710, closing this issue. @bgintz-rsc Please feel to reopen in case you have any updates or still observing this issue.

hdmt-hock commented 4 years ago

Hello

using esp-idf master branch, I am having problem with esp32 wifi connect to Arris TG1672, a common router from the local service provider.

Using the example code in espressif/esp-idf/examples/protocols/http_server/simple, I could get the local ip address and access the /hello when connecting to NISUTA router but never possible on the Arris TG1672.

Based on espressif/esp-idf#4843, the issues should have been resolved in esp-idf release 3.3.

I also use mongoose os mos 2.18 which should address the WPA downgrade issue to test the wifi connection, same result. Works on Nisuta and not Arris.

Anyone around to give some hint on how to move forward with Arris router? (this message was also posted on Gitter, was recommended to post here)

Edit: Hardware : TTG0 (heltec wifi lora 32, v1, 915Mhz) Router : Nisuta N300 (works! got v4 IP) : Arris TG1672 (no v4 IP !!!)

hdmt-hock commented 4 years ago

hm... Strange... Today, the esp32 works with the Arris router with mongoose os mos 2.18. Manage to connect from mac to http of the esp32. Well, I will continue to monitor.