esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.97k stars 13.34k forks source link

no more 802.11n connection since 2.6.0 #7965

Open stef-ladefense opened 3 years ago

stef-ladefense commented 3 years ago

Basic Infos

Platform

Settings in IDE

Problem Description

sketch is "Udp NTP Client" included with release core.

I recently noticed that I had 802.11n and 802.11g devices on my network. and I noticed that those which were connected in 802.11n are old modules on which I did not make any change in their codes and which were compiled with a core version lower than 2.5.0 (2.4.2 for the majorities)

I therefore put the supplied program "Udp NTP Client" and I tested with different versions of the cores, starting from 2.4.2.

up to version 2.5.1 included, it appears in 802.11n on the wifi router of my box. except since version 2.6.0 until the last 2.7.4, it appears in 802.11g ...

do you have any idea why?

image

fsommer1968 commented 3 years ago

Really a firmware problem? My 8266 modules connect well with 2.7.4: grafik

grafik NB: 72 MBit is only possible with N not with G

stef-ladefense commented 3 years ago

that's why I'm asking for advice! I don't change anything, it's the same Lolin module (original V3.10) it's the same code (delivered with the core, I didn't write anything more) no change of config on my router I install the 2.42 core, I am recognized in 802.11n on the router, ditto in 2.5.1. as soon as I install the 2.6.0 core I am recognized in 802.11g, ditto until the current version 2.7.4.

for me there is something that has changed with the passage of 2.6.0

d-a-v commented 3 years ago

You can try with other/older firmware version in the IDE menu: You need to select the board "generic esp8266" and match its configuration with your board. A new menu entry will propose several versions of the closed nonosdk firmware.

And if possible do it with the current source code. There are three ways for doing so: latest git master / PlatformIO staging version / arduino board manager snapshot release 0.0.1.

stef-ladefense commented 3 years ago

so: core 2.74 card configured in generic esp8266, with flash DIO mode.

with the nonos-sdk versions "2.2.1 + 100", "111", "113", "119" I am connected to the router in 802.11g. with the versions "nonos-sdk 2.2.1 (legacy)" and "nonos-sdk pre-3 (180626 known issues)" I am connected in 802.11n !!! on as I configure in "LOLIN (WEMOS) D1 R2 & mini" I go back to 802.11g.

I did not find where is to configure the version of the nonos-sdk for the lolin?

d-a-v commented 3 years ago

Well that's interesting !

This menu is disabled for non generic boards. There are three possibilities:

I think 1) is the easiest. The newer versions fixed some bugs and we settled to the current menu status a while ago after some discussions.

stef-ladefense commented 3 years ago

yes i will do that, thanks

on the other hand, how to know which version of nonossdk is used when precisely a non generic card is used?

d-a-v commented 3 years ago

It's the same one as the default value (= the first) for the generic board.

You can also read its (default when not overwritten by menu selection) value in platform.txt:

build.sdk=NONOSDK22x_190703

(which is nonos-sdk 2.2.1+100 (190703) you can see that in boards.txt).

fsommer1968 commented 3 years ago

Indeed interesting your experience with the different SDK releases, because I´m using V 119 (including some Wemos D1 Mini as generic 8266 to change SDK version). Have to tried deleting WiFi settings during flash? grafik

stef-ladefense commented 3 years ago

En effet intéressant votre expérience avec les différentes versions du SDK, car j'utilise V 119 (y compris certains Wemos D1 Mini comme générique 8266 pour changer de version SDK). Avez-vous essayé de supprimer les paramètres WiFi pendant le flash?

i test this, with generic, v119, and erase sketch +wifi settings... and 802.11g too

vortigont commented 3 years ago

I can confirm it. 2.6 and later can't use n-mode at all. If I enable "option require_mode 'n'" in OpenWrt (so called "Greenfield" mode when only n-capable clients can connect) esp with core 2.5.1 can connect to the AP, more recent ones - can't connect at all.

"Core":"2_7_4","SDK":"2.2.2-dev(38a443e)" - G-mode, no MCS value

iwinfo wlan0 assoclist
3C:71:BF:29:60:CF  -59 dBm / unknown (SNR -59)  250 ms ago
        RX: 48.0 MBit/s                                   77 Pkts.
        TX: 5.5 MBit/s                                    64 Pkts.
        expected throughput: unknown

"Core":"2_5_1","SDK":"2.2.1(cfd48f3)" - N-mode indicated by "MCS 6"

iwinfo wlan0 assoclist
3C:71:BF:29:60:CF  -58 dBm / unknown (SNR -58)  40 ms ago
        RX: 54.0 MBit/s                                  201 Pkts.
        TX: 58.5 MBit/s, MCS 6, 20MHz                     73 Pkts.
        expected throughput: 4.5 MBit/s
TD-er commented 3 years ago

Could it be that OpenWRT only advertises a limited set of HT MCS modes and the ESP only supports a different subset? In MikroTik APs you can see the advertised modes and I guess you can also enable/disable them although the checkboxes in my MikroTik units are greyed out.

vortigont commented 3 years ago

I've tested 802.11n with default configuration, single channel bandwidth, no limitations or restrictions for specific MCS, etc... same board with the same code (just a simple WiFi connect) is able to connect to the same WiFi AP with an old core and unable with a recent one.

Jason2866 commented 3 years ago

Can not reproduce. Core 2.7.4

00:00:03.970 WIF: Connecting to AP1 Jason_Home_WLAN Channel 12 BSSId 00:A0:57:2A:BD:19 in mode 11n as sonoff-71C254-0596...
00:00:06.268 WIF: Connected
stef-ladefense commented 3 years ago

have tested with new 3.0.0 arduino esp8266 i have only connect 802.11n with nonos sdk 2.2.1 legacy, not with other

Jason2866 commented 3 years ago

Tried with core 3.0.0 connects in mode 11n

00:00:00.065 Project tasmota Tasmota Version 9.4.0.4(lite)-STAGE(2021-05-28T14:25:30)
00:00:00.520 WIF: Connecting to AP1 Jason_Home_WLAN Channel 12 BSSId 00:A0:57:2A:BD:19 in mode 11n as tasmota_D4407C-0124...
00:00:02.751 WIF: Connected

Core 2.7.4 connected to a OpenWRT device

00:00:00 I2C: BME280 found at 0x76
00:00:04 WIF: Connecting to AP1 Jason_Home_WLAN Channel 4 BSSId 88:C3:97:B1:1D:56 in mode 11N as sonoff-17DBAE-7086...
00:00:06 QPC: Reset
00:00:06 WIF: Connected
vortigont commented 3 years ago

Tried with core 3.0.0 connects in mode 11n

Those logs are from ESP and they lie. It reports mode n but connects as mode g actually. You should check the access-point itself which mode is used by it's clients. In OpenWrt you can check with 'iwinfo wlan0 assoclist', if esp client's MAC is missing any MCS value than it is in mode g.

And yes, I confirm, with Core 3.0.0 it's the same issue - no mode N. My guess it is related to SDK 2.2.2, not Arduino core.

TD-er commented 3 years ago

So you're telling me that if I configure the access point to only allow 802.11n clients, it is impossible to connect to an access point using current builds based on SDK 2.2.2?

My MAC address of an ESP: FC:F5:C4:8B:71:60 SDK: ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support Stated wifi connection: 802.11g (RSSI -57 dBm)

Mikrotik AP it is connected to: image

image

I will now set it to 'n' mode to see....

Set to connect via "n" mode:

WiFi Connection: 802.11n (RSSI -55 dBm) image

TX rate is > 54 Mbps, so this can't be "g".

As a test, the WiFi mode is set to "2GHz-only-N" (cleared the password field for the screenshot) image

As you can see, the same node is connecting just fine.

stef-ladefense commented 3 years ago

where do you find the nonos sdk 2.22 ? with the instalation of the arduino 3.0.0 esp8266 core I have version 2.2.1 image image

d-a-v commented 3 years ago

2.2.2 is not and will probably never be out. 2.2.1+n (git versioning) is noted 2.2.2-dev by espressif's system_get_sdk_version() in their 2.2.x git branch.

vortigont commented 3 years ago

So you're telling me that if I configure the access point to only allow 802.11n clients, it is impossible to connect to an access point using current builds based on SDK 2.2.2?

yes, that is exactly what I see. Need to make sure that b/g modes are completely disabled. Are you able to get the details of ESP client's MCS used? TX bandwidth might be confusing, only MCS index used could indicate which modulation type of N mode is used. Do not have RouterOS devices at hand to test, but I've checked the docs and it seems that "2GHz-only-N" actually means all 2.4 b/g/n Modes. Screenshot from 2021-05-30 22-59-24

Pls, test it carefully if possible. That issue seems very tricky.

TD-er commented 3 years ago

I don't know how to see what MCS's are used. But if I set my ESP to use 'G' only, and the MikroTik to "2GHz-only-N", the ESP cannot connect to the AP, unless it switches back to "n" mode. I have it programmed to go to "n" mode as fallback if it cannot connect in "g" mode after 10 attempts (or xx seconds)

So to me it looks like it is working in "n" mode.

The table you posted seems odd, as it does share the "2GHz-only-N" option along with the "b/g/n" options. image

Here the settings of the Wlan adapter: image image image

vortigont commented 3 years ago

I have it programmed to go to "n" mode as fallback if it cannot connect in "g" mode after 10 attempts (or xx seconds)

So by default it always connects in G mode for you, right? And you have to switch it to N with WiFi.setPhyMode(WIFI_PHY_MODE_11N) to connect with N, right?

The table you posted seems odd, as it does share the "2GHz-only-N" option along with the "b/g/n" options.

It's from the official doc, I guess that GUI and CLI might have some syntax differences. I used CLI with Mikrotik's quite some time ago, do not have it now unfortunately.

TD-er commented 3 years ago

Yep, I made it configurable to what mode should be used to start with and as a fallback it switches to "n" mode after set number/time of failed attempts. image

https://github.com/letscontrolit/ESPEasy/blob/f3ce88eaef3f88d7a525eb017ac9dec718c5578f/src/src/ESPEasyCore/ESPEasyWifi.cpp#L1062-L1070

vortigont commented 3 years ago

I've played around with wifi_set_phy_mode(PHY_MODE_11N); not that it changed anything. For the SDK 2.2.1 it works as expected, for 2.2.2-git (arduino core >2.5.1) it makes no difference - MCS and WMM is not available and it does not connects at all in GreenField mode. Maybe it is AP hardware specific, but the fact is the same ESP board with the same user code works differently depending on on build env.

@TD-er BTW, I've noticed that your settings screenshots contains 2GHz-only-N and WMM disabled at the same time. This is completely wrong, for HT rates in N mode WMM is mandatory if I remember WiFi 4 specs correctly. Not sure if it should even work at all in greenfield mode without WMM.

This is a ESP client build with SDK 2.2.1 and AP in greenfield N mode

iw wlan0 station dump
Station 5c:cf:7f:02:50:f9 (on wlan0)
        inactive time:  20 ms
        rx bytes:       38408
        rx packets:     1506
        tx bytes:       2060
        tx packets:     15
        tx retries:     5
        tx failed:      0
        rx drop misc:   0
        signal:         -56 [-57, -56] dBm
        signal avg:     -57 [-57, -59] dBm
        tx bitrate:     6.5 MBit/s MCS 0
        rx bitrate:     6.0 MBit/s
        rx duration:    150093 us
        expected throughput:    4.394Mbps
        authorized:     yes
        authenticated:  yes
        associated:     yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no
        DTIM period:    2
        beacon interval:100
        short preamble: yes
        connected time: 680 seconds
Jason2866 commented 3 years ago

It does connect in mode "n". Because we had issues with routers only supporting mode "n". When STA and AP mode is active ESP8266 can only use 11b and 11g. When only using STA mode device connects without an issue to router supporting only mode "n". The whole discussion with solution https://github.com/arendst/Tasmota/discussions/12512#discussioncomment-963099

dalbert2 commented 3 years ago

I have also been having significant WiFi issues in 2021.
With a Ubiquiti 802.11n access point, it connects and disconnects successfully a few times, but at some point, it stops being able to connect and stay connected; it goes from connected to connection_lost almost immediately and once that happens, it will not be able to connect again until it is reset. This is highly reproducible with 2.7.4. Interestingly, with 3.0.x it also springs a huge memory leak, when this happens, losing 2-2.5K with each scan/connection attempt. Here's a trace below showing the connect attempt failure in case it is of any use:

scandone state: 0 -> 2 (b0) IDLE->SCAN_COMPLETED state: 2 -> 3 (0) CONNECTED state: 3 -> 5 (10) CONNECTION_LOST add 0 aid 6 cnt CONNECTED <<< Everything above is normal; when things work properly, the next line will be connected with mySSID, channel 1 However once things start failing, it looks like what's below >>>> state: 5 -> 2 (2c0) CONNECTION_LOST -> SCAN_COMPLETED rm 0 wifi evt: 1 STA disconnect: 2 state: 2 -> 0 (0) del if0 usl mode : null

sblantipodi commented 3 years ago

I have also been having significant WiFi issues in 2021. With a Ubiquiti 802.11n access point, it connects and disconnects successfully a few times, but at some point, it stops being able to connect and stay connected; it goes from connected to connection_lost almost immediately and once that happens, it will not be able to connect again until it is reset. This is highly reproducible with 2.7.4. Interestingly, with 3.0.x it also springs a huge memory leak, when this happens, losing 2-2.5K with each scan/connection attempt. Here's a trace below showing the connect attempt failure in case it is of any use:

scandone state: 0 -> 2 (b0) IDLE->SCAN_COMPLETED state: 2 -> 3 (0) CONNECTED state: 3 -> 5 (10) CONNECTION_LOST add 0 aid 6 cnt CONNECTED <<< Everything above is normal; when things work properly, the next line will be connected with mySSID, channel 1 However once things start failing, it looks like what's below >>>> state: 5 -> 2 (2c0) CONNECTION_LOST -> SCAN_COMPLETED rm 0 wifi evt: 1 STA disconnect: 2 state: 2 -> 0 (0) del if0 usl mode : null

this is a major problem I am having.

8292

at some point the ESP stops responding to WiFi until a router reboot or until the ESP reset itself.

sblantipodi commented 3 years ago

the funny things is that the latest Asus firmware upgrade prevent my ESP8266 devices from connecting on 802.11G, 802.11N works.

TD-er commented 3 years ago

Maybe the latest firmware now has some default setting which makes the AP effectively "n-only" ? Could be that it is hard to recognize because of its label. For example (just made up now, no idea what Asus is using) calling it "enhanced stability", "gaming mode" or something similar. Side effects of an AP allowing both 'b/g' and 'n' devices is that it may decrease responsiveness. So it would make sense to refer to it as some kind of "gaming mode".

sblantipodi commented 3 years ago

@TD-er thank you for the answer, I appreciate it. the router is set in Legacy mode and they clearly shows: image

I think that they just scrambled everything since the previous firmware let the ESP to connect to the 802.11G.

TD-er commented 3 years ago

Sometimes it may help to switch a setting, save it and switch it back and save it.

For example, when a new flag is added, but its setting is not set to the proper value.

Let's assume the possible settings are:

And the value in the stored settings is not set to "1", but it is now "0". The combo box is generated for the web page and its value is "0". The browser will then show the first item of the combo box, even though it isn't the correct value.

sblantipodi commented 3 years ago

tried it but it does not help unfortunantely, thank you for all the tips, very appreciated.

Jason2866 commented 3 years ago

We have feedback on DISCORD by using https://www.asuswrt-merlin.net/ to solve connect problems.

sblantipodi commented 3 years ago

@Jason2866 Merlin uses the "old" firmware, if I rollback to an old firmware I don't have connection problems :)

Jason2866 commented 3 years ago

What is the benefit of the new trouble making firmware? Asus is known to release bleeding edge router firmware. Anyway you have a starting point to compare. Wireshark should give you insights.

sblantipodi commented 3 years ago

@Jason2866 you are right... in any case I solved the 802.11G problem. a firmware reset solved the problem but I had to manually reconfigure the router.

mgorven commented 2 years ago

This thread is a bit old, but posting in case this is useful for someone.

The cause of this is that the NONOS SDK disabled WMM/WME in https://github.com/espressif/ESP8266_NONOS_SDK/commit/6580ca35e9f8f85d11100c1ad3d2ed27d4b26381. WMM is a mandatory part of 802.11n and so compliant APs (including hostapd) will disable 802.11n and fallback to 802.11g if the client does not support WMM.

I'm using Arduino core 2.7.4 which ships with multiple SDK versions. Version NONOSDK22x_190313 is from before this change and supports 802.11n, whereas version NONOSDK22x_190703 is after this change and does not. I use the older SDK by supplying the -DPIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 PlatformIO build flag.

It looks like WMM/WME was later added back in https://github.com/espressif/ESP8266_NONOS_SDK/commit/823019bc723c2ef7453c78369e499045d0f27ab4, but this version hasn't been picked up by the Arduino core.

d-a-v commented 2 years ago

Thanks for this report @mgorven Arduino IDE users are currently unable to use this specific version 19-03-13 of the nonos-sdk 2.2.x firmware.

8363 aims at fixing this.

So in order for 801.11n works again for some users, it is advised to try with:

Shall we move this thread to a discussion?

TD-er commented 2 years ago

Could this be also related to this issue? https://github.com/esp8266/Arduino/issues/8299#issuecomment-962261539 If the ESP nodes don't announce to support WMM, is it possible modern Asus AP's (the more high end ones, often also running WiFi mesh) simply reject those ESPs?

mgorven commented 2 years ago

If the ESP nodes don't announce to support WMM, is it possible modern Asus AP's (the more high end ones, often also running WiFi mesh) simply reject those ESPs?

If the AP is configured to only accept 802.11n clients, then yes, they wouldn't be able to connect. They should fail before attempting DHCP however, the 802.11 Association Request should get rejected.

vortigont commented 2 years ago

If the AP is configured to only accept 802.11n clients, then yes, they wouldn't be able to connect.

That's exactly what happens with OpenWRT's n-only 'greenfield' mode. Thanks for references. @mgorven. Makes it all clear now! :)

1d4rk commented 2 years ago

So this pointed me in the right direction. Using Asus Wireless Router with ESP8266 Arduino core 2.7.4 library we can see a DHCPDiscover -> DHCPOffer and no DHCPRequest following a 802.11 Disassoc. I used a Sonoff Basic r2 with Tasmota 10.0.0 firmware (Arduino core 2.7.4) so I changed the WiFi config. as reported in the Tasmota Commands webpage https://tasmota.github.io/docs/Commands/:

Wifi: 0 = disable Wi-Fi 1 = enable Wi-Fi (default)  2 = Wi-Fi mode 802.11b 3 = Wi-Fi mode 802.11b/g 4 = Wi-Fi mode 802.11b/g/n

Setting WiFi mode 3 (802.11g mode) the module is able to connect to AP (it shows 802.11g) and get an ip address without any issue.

Another test is to disable WMM in the router but I can't because I am using 802.11ax protocol or use the older SDK NONOSDK22x_190313 that supports WMM in 802.11n mode.

TD-er commented 2 years ago

Just an idea... Can these Asus AP's create a second SSID with its own parameters? (as long as it is still the same channel as the first SSID on that AP) Even the cheapest MikroTik ones can do this, so I wonder if those more expensive ones can do it too. That would make it perhaps easier to make a proper work-around.

TD-er commented 1 year ago

Thanks for this report @mgorven Arduino IDE users are currently unable to use this specific version 19-03-13 of the nonos-sdk 2.2.x firmware. #8363 aims at fixing this.

So in order for 801.11n works again for some users, it is advised to try with:

  • Arduino IDE generic board (esp8266/esp8285) Tools>Espressif FW>"nonos-sdk 2.2.1+61 (190313)"
  • Arduino IDE generic board (esp8266/esp8285) Tools>Espressif FW>"nonos-sdk 2.2.1 (legacy)"
  • platformIO (any board) -DPIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313
  • platformIO (any board) -DPIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK221

Shall we move this thread to a discussion?

I finally looked into this a bit more as I was experiencing lots of strange issues reported by users which were really hard to reproduce on my own nodes.

PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 and PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK221 do both allow to connect to the access point using actual 802.11n protocol. Thus the connection speed was > 54 Mbps and QoS field is reporting WMM in the FritzBox user interface: image

However, using PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 sometimes results in an infinite loop of disconnects, reconnects, disconnects, etc.

Builds with PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 seem to be connecting just fine at first (right after flashing), but after a warm reboot it just isn't able to connect at all. It keeps getting disconnected with at first (8) Assoc leave, followed by (202) Auth fail, (4) Assoc expire or (203) Assoc fail 202 is the most occuring one, followed by 203. N.B. it never seems to get an IP.

So far, I have been unable to reproduce such behavior when using PIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK221

Using these platformIO env parameters:

platform                  = espressif8266@2.6.3
platform_packages         =
    framework-arduinoespressif8266 @ https://github.com/esp8266/Arduino.git#2.7.4
build_flags               = ${esp82xx_2_6_x.build_flags} 
                            -DPIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK221
                            -Wno-deprecated-declarations

So to me the "legacy" SDK looks to be the better one.

TD-er commented 1 year ago

OK, spent a lot of time debugging the last 24h.

Eventually I could get the "legacy" SDK blob to also end up in a reconnect-loop which may only end due to sheer luck and/or specific timings.

However, I think I finally may have found what is happening here. To give some info on the network setup here: I do have a WiFi mesh network using several Fritzbox access points. When connecting in G-only mode, the mesh functionality of the APs do not act "funky" and just allow me to connect even on the weakest signal.

image

When connecting in "N" mode, it seems to depend on the time I tried to connect to an AP, but I eventually get kicked off from the AP if the mesh believes there is another AP which is believed to allow for a stronger connection. Thus when only trying to connect to a "bad" one, it may take a while (upto a minute sometimes) before I get kicked off. If I try to connect to all nodes in a sequence (or perform an active scan), I will be kicked off the weaker ones almost immediately. The keyword being "almost" as in it may refuse me to connect, or allow connection but then kick me off even before I could get an IP address.

One thing that's obvious when trying to connect to a weak access point is that you may miss packets, or the AP may miss packets sent by your station. Thus what will happen every now and then is that the connection attempt to an AP may timeout and then the AP sends you some kind of disconnect acknowledgement (don't have Wireshark logs to know what the exact response is) So while attempting to connect to AP2, you may suddenly receive a disconnect packet from AP1 as it finally registers the timeout. But the ESP still fires a disconnect event. Not sure if that should be considered a bug, but it is a fact this may happen. (doesn't the de-auth project do exactly this?)

Since my WiFi code (which is a terrible mess due to all the work-arounds added in the past 4 years) does act on WiFi events, I ended up processing this disconnect event even though I wasn't yet connected to anything and thus messing up the current connection attempt. This caused my code to conclude it could not connect to this AP and thus moved over to the next one. But the next one would reject my connection attempt as it wasn't the strongest one, etc.

In my tests, I also looked a bit into the bss_info struct. On ESP32 this struct is made available right after a scan, but on the ESP8266 it isn't unless you call the SDK scan yourself with your own callback function. So my question is, can we have a get function to access this bss_info struct too in the ESP8266WiFiScanClass ? (@d-a-v ??) The current implementation of the _onComplete callback function doesn't give access to the bss_info. This would then allow me to see if an access point allows connecting in "G" mode as some kind of fallback mode to know if I could still connect to an AP when wrongfully being kicked off an AP as it assumes another one is better suited.

TD-er commented 1 year ago

Another 24 hours of pure frustration here.

It seemed like every attempt to dive more into this issue made things worse. It was at some point nearly impossible to get my ESP connected anymore. Took between 10 and 60 minutes to get the ESP connected to my meshed (or should I say "messed up") network. At some point I stashed all my work, loaded a known to work image and that one also was unable to connect again in reasonable amount of time.

Right after this I popped my experimental code, flashed it again and (luckily?) the ESP would not connect. Then I rebooted all access points and voila, the ESP connected immediately.

So my theory is that the ESP may not officially disconnect from the access point and thus the AP may refuse the node to connect as it was still assuming it was connected. This may take some timeout before the AP believes it is no longer connected and a new connection is allowed. However, I got the impression that retrying and getting rejected may actually reset this timeout timer? Maybe as a security measure against "hacker like" behavior? But this would also explain why I may sometimes see de-auth behavior on the ESP where it should not happen as it was connected just fine.

Anyway, this has been extremely frustrating... :( I will now look into the option of quickly connecting and forcefully disconnecting to see if that may work.

N.B. The mikrotik AP I also have here doesn't seem to have any problems with this.

Edit: Almost forgot... It is very clear not all WiFi events are being given to the user Arduino code. I often don't get the connected event and sometimes both the connected and got IP event are not fired but I appear to be connected anyway.

Jason2866 commented 1 year ago

@TD-er I feel with you. Made the same experiences. Wasted many hours (Theo and Adrian too) with this. Finally we ended to set up a extra WLAN (with own DHCP) and for every test/change all is restarted. After long test series we choose build NONOSDK22x_190703 = 2.2.2-dev(38a443e) -DPIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190703 as the only one which behaves stable and reproducable in behaviour. The culprit it is special for "n". For most it does work and it does connect in mode "n".

TD-er commented 1 year ago

Right now I have the WiFi code working quite stable using DPIO_FRAMEWORK_ARDUINO_ESPRESSIF_SDK22x_190313 However the code now has become even more complex. So I will still need to write down all the needed work-arounds and quirks and move it all to a very clean state machine. (and finally getting rid of the code I'm least proud of as it has become way too complex in the past few years)

For sure the WiFi event handling is missing events every now and then. Also the reported WiFi state is not always reflecting the true state.

When running with AP+STA I now explicitly set the phy mode to 802.11g as the ESP8266 cannot use 802.11n in AP+STA mode. Not sure if this really makes a difference as I have been testing and trying so many things in the past 5 days that I'm not 100% sure whether this change actually made the difference here.

Another quirk which I need to look into is the processing of disconnect events. It is still very well possible I messed up in my (way too complex) code, but so far I could not find the error in my code. Anyway, what I do see is that my processDisconnect() function may loop for about 10 seconds where it seems it keeps receiving the disconnect event. It can very well be an error in my code, but like I said, have to look into it a bit more.

But so far I am quite happy/relieved it finally is working quite well. Also my tests using ESP-NOW (not yet published code) along with this SDK version do seem to run way more stable. Also between ESP32 and ESP8266 which was absolutely less stable when running with the other SDK binary blobs. Still have to test a bit more, but at least my testing companion for the ESP-NOW code was absolutely ecstatic about the first 6+ hours of testing.

gmdriscoll commented 1 year ago

@TD-er Thank you for all the hard work. This validates many days troubleshooting work here, as well. For now, I just leave it in G-Mode but would like to have the ESP8266 connecting to routers with no preferences between G and N, and no end-user default router setting changes or limits.

someburner commented 1 year ago

@TD-er Thanks for looking into this so deeply. I've been lurking on these issues trying to figure out the best SDK version to use, that works for the most people/routers.

I was using 190703 as that was recommended by folks here, but in some cases it appears to fare worse than older SDK versions. it'd be great to have a writeup of all the quirks you found with 190313.

Would it be true to say 190313 has more quirks than 190703 (besides the 802.11n issue)? Some of the bugfixes since 190313 seem like they could be important, but always impossible to really know with the nonos sdks.