espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.55k stars 7.39k forks source link

ESP32 WiFi.begin works only every second time - workaround #2501

Closed DIRR70 closed 2 years ago

DIRR70 commented 5 years ago

Hardware:

Board: ESP32-WROOM-32 Dev Module Core Installation version: 1.0.1 (installed using board manager feb/23) IDE name: Arduino IDE 1.88 CPU Frequency: 80MHz Flash Frequency: 40Mhz Flash Mode: QIO Flash Size: 2MB (16MB) Partition Scheme: Standard PSRAM enabled: no Upload Speed: 921600 Computer OS: Windows 10

Description:

With a portable device I need to ensure that WiFi connection is beeing established as soon as the router is in range. Also I need to do some stuff as soon the connection is established or lost (in my sketch below just a serial output).

I have the problem that "WiFi.begin(ssid, pass)" works only every second time. If it doesn't work, I have to press the reset button and then it works. If I press reset again it won't work. Pressing reset again and it works, and so on.

If it doesn't work I get the following output with Debug Level set to verbose: [W][WiFiGeneric.cpp:357] _eventCallback(): Reason: 202 - AUTH_FAIL

My router is a Fritz! Box 6490 cable and even though I've checked the router's options I can't find anything wrong. Regarding to familiar posts I even have tried with a regulated power supply at 5V (current limit 2.5A) - but no change.

I came up with a workaround with a task checking the connection and that works pretty well. In case of "WiFi.status" turns to "WL_CONNECT_FAILED" I need to call "WiFi.disconnect(true)" and a bit later "WiFi.begin" again.

About the whole working sketch below I have several questions:

  1. Is that "works/doesn't work" behaviour a firmware bug, a bug in arduino or this SDK or a router problem?
  2. In my sketch I'm using "xTaskCreatePinnedToCore" for the connection watching task. How large does the stack for that task need to be (seems to work with 8kB) and what priority does such a task need to have (I just took 3 to be higher than IDLE)?
  3. I never got "WL_CONNECTION_LOST" as Wifi status. Do I need to consider that? And if so, how?

Please advise! Thank's!

Sketch:

#include <WiFi.h>

const char* ssid = "...";
const char* password = "...";

bool myWiFiFirstConnect = true;

void myWiFiTask(void *pvParameters) {
  wl_status_t state;

  while (true) {
    state = WiFi.status();
    if (state != WL_CONNECTED) {  // We have no connection
      if (state == WL_NO_SHIELD) {  // WiFi.begin wasn't called yet
        Serial.println("Connecting WiFi");

        WiFi.mode(WIFI_STA);
        WiFi.begin(ssid, password);

      } else if (state == WL_CONNECT_FAILED) {  // WiFi.begin has failed (AUTH_FAIL)
        Serial.println("Disconnecting WiFi");

        WiFi.disconnect(true);

      } else if (state == WL_DISCONNECTED) {  // WiFi.disconnect was done or Router.WiFi got out of range
        if (!myWiFiFirstConnect) {  // Report only once
          myWiFiFirstConnect = true;

          Serial.println("WiFi disconnected");
        }
      }

      vTaskDelay (250); // Check again in about 250ms

    } else { // We have connection
      if (myWiFiFirstConnect) {  // Report only once
        myWiFiFirstConnect = false;

        Serial.print("Connected to ");
        Serial.println(ssid);
        Serial.print("IP address: ");
        Serial.println(WiFi.localIP());
        Serial.println("");
      }

      vTaskDelay (5000); // Check again in about 5s
    }
  }
}

void setup() {
  delay(1000); // Power up

  Serial.begin(115200);

  // Create a connection task with 8kB stack on core 0
  xTaskCreatePinnedToCore(myWiFiTask, "myWiFiTask", 8192, NULL, 3, NULL, 0);
}

void loop() {
}

Debug Messages:

If WiFi.begin works:
[D][WiFiGeneric.cpp:342] _eventCallback(): Event: 0 - WIFI_READY
[D][WiFiGeneric.cpp:342] _eventCallback(): Event: 2 - STA_START
....[D][WiFiGeneric.cpp:342] _eventCallback(): Event: 4 - STA_CONNECTED
[D][WiFiGeneric.cpp:342] _eventCallback(): Event: 7 - STA_GOT_IP
[D][WiFiGeneric.cpp:385] _eventCallback(): STA IP: 192.168.178.21, MASK: 255.255.255.0, GW: 192.168.178.1

If it doesn't work:
[D][WiFiGeneric.cpp:342] _eventCallback(): Event: 0 - WIFI_READY
[D][WiFiGeneric.cpp:342] _eventCallback(): Event: 2 - STA_START
[D][WiFiGeneric.cpp:342] _eventCallback(): Event: 5 - STA_DISCONNECTED
[W][WiFiGeneric.cpp:357] _eventCallback(): Reason: 202 - AUTH_FAIL
ullix commented 4 years ago

What do you mean: a) turn off the router, or b)turn off the ESP32?

a) turning off the router and restarting it is what I did a few times when AVM support made some suggestions on this issue. But this takes quite a few minutes, and all WAN, LAN, WiFi, and Telephone is down in my house. Interestingly, after this ordeal the esp could connect to the FB7490 on first hit. From then on it was only double-hitting.

b) turning off the ESP32 - meaning de-powering and rebooting? - is in essence what I do, because that would be the second connection.

Lay out a clear proposal for sequencing ESP actions and I'll do it.

tablatronix commented 4 years ago

The esp, actually removing power as opposed to a reset, no delay and with a delay, to see if the same thing occurs.

ullix commented 4 years ago

So, the ESP is running smoothly. I pull the USB plu; the ESP is depowered. Wait for 10sec. Then replugging.

Result: exactly the same as before; a double hitter is needed.

tablatronix commented 4 years ago

Even if you wait like a minute? So strange

How many APs do you guys have around you , curious , is it excessive/enterprise , or normal residential 10 or so

ullix commented 4 years ago

Waited 1min 17sec: no change, double hitter needed.

This is residential. Look at my pics from the FB7490 WiFi connections a few posts earlier. There is not much going on. On 2.4GHz my smartphone, about a meter away from the router, sees two APs from the FB7490, one from the FB7272, and one TV. That is all.

ullix commented 4 years ago

I can offer another router: configured my smartphone (Android 7.1.1) as router, and connected the ESP to the net via smartphone.

Went smooth, only a single-hit needed, though it took a bit longer:

        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 0 - WIFI_READY
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 2 - STA_START
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 4 - STA_CONNECTED
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 7 - STA_GOT_IP
        [D][WiFiGeneric.cpp:381] _eventCallback(): STA IP: 192.168.43.127, MASK: 255.255.255.0, GW: 192.168.43.1
        Check[ms:Status]: 0:6 100:6 200:0 300:0 400:0 500:0 600:0 700:0 800:0 900:0 1000:0 1100:0 1200:0 1300:0 1400:0 1500:0 1600:0 1700:0 1800:0 1900:0 2000:0 2100:0 2200:0 2300:0 2400:0 2500:0 2600:0 2700:0
        2800:0 2900:0 3000:0 3100:3 Connected

You see it was on status 6 (disconnect) for only 100ms, but then on 0 (idle) for almost 3sec, then connected. Overall a longer connect time as needed for double-hitting the bad router, but in a single hit only. Perhaps the Android CPU not being as powerful as the FB7490?

Perhaps relevant, perhaps not: in this single-hit there was also an idle period before connect, as was with the good router, FB7272, and was not with the bad router.

kugelkopf123 commented 4 years ago

Perhaps it has something to do with the Bandsteering from the FB7490? Have you tried to deactivate this?

pedrorambo commented 4 years ago

I can confirm that putting the WiFi.begin(...) inside the block bounded by while (WiFi.status() != WL_CONNECTED) works for me. Setup is similar to what the OP described with a Fritz!Box on the other end and hard-coded credentials.

Worked for me too. I used to do ESP.restart(); and connect again to solve this problem, but that solution is way better.

Waiting for an "official" fix...

Hardware: DOIT ESP32 DEVKIT V1 80MHz

flo-x commented 4 years ago

May I add a cent, or less... I had this problem once, with a specific router (not my home one). I remember its password was an unusual 64 hexadecimal digits. Maybe it has something to do with the kind of password or its length?

barneyz commented 4 years ago

I have also the "double-hitter-problem" with a Fritz!box 7590 (password 16 digits), located in Germany. But with a Fritz!Box 7360 without dual band WiFi (only 2,4GHz, no 5GHz), it connects every first time. Can dual band be a reason? The ESP8266 never need a second try and connects much faster!

ullix commented 4 years ago

@kugelkopf123 : Perhaps it has something to do with the Bandsteering from the FB7490? Have you tried to deactivate this?

What do you mean with that? Where do I find this setting in the router?

EDIT: If I am not mistaken then bandsteering means switching between 2.4GHz and 5GHz? The ESP32 can do only 2.4GHz.

ullix commented 4 years ago

@flo-x : At least both my good and bad Fritz.Box router use the same password length of 20 characters, while my good Android router uses 11 characters. Currently nothing can be excluded, but that would be a really weird one.

ullix commented 4 years ago

@barneyz : Can dual band be a reason?

My dual band is bad, my single 2.4GHz band router is good, so it fits. However, my Android phone configured as router is dual band also, but is good.

Nevertheless, anyone reporting 'good' or 'bad' routers should be encouraged to name their router and give such specs. Maybe a picture emerges.

@barneyz : when I had contacted AVM support with my double-hitter problem they made it sound like I am the only one to ever have this problem. Good to see that there are more people with this problem, and even with a newer FB. Why don't you contact AVM service also and report this problem? They need to experience some pressure from customers. I contacted them at: service@avm.de (I still believe it is a router problem, though ESP32 code needs to get a workaround for such bad routers, I use the FBs since many years and still do like them).

ullix commented 4 years ago

@kugelkopf123
Found the bandsteering setting and and inactivated it (although the ESP32 would not have been able to use it, as it has only a single 2.4Ghz band wifi).

The first connection attempt after that was a single-hitter! But jubilation faded quickly since all subsequent ones (~ a dozen) were double-hitters.

Then I inactivated the two other options (co-existence and TV). No benefit; nothing but double hitters!

ullix commented 4 years ago

Tried one more thing, but the result is not unexpected. I set my FB7490 to password-free access, and all connections were single-hitters, and they were fast, < 200ms!

            WiFi.begin(SSID) // no password!
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 0 - WIFI_READY
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 2 - STA_START
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 4 - STA_CONNECTED
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 7 - STA_GOT_IP
        [D][WiFiGeneric.cpp:381] _eventCallback(): STA IP: 10.0.0.85, MASK: 255.255.255.0, GW: 10.0.0.1
            Check[ms:Status]: 0:6 100:6 200:3 Connected

Interestingly, after I switched back to with-password access, the first connection attempt was in all aspects exactly as the password-free access, including < 200ms connection, while all subsequent ones required double-hitting.

Authentication is the issue, but where? Router and/or ESP32?

kugelkopf123 commented 4 years ago

@ullix Sorry I didn't answer you sooner. Just packing for a short vacation ;)!

Yes exactly, I meant the automatic change between 2,4ghz and 5ghz forced by the FB7490. I've often heard that "2,4ghz only" devices have problems with it. So I suggested this.

It's strange that it really seems to be the router. I've only had my attempts with the FritzBox7490 so far.

But it must be possible to control it somehow with the ESP32. Because I have a bag full of ESP8266 where not a single one of them has problems with the router.

Unfortunately, this topic has been around for quite a while and it's not really going anywhere. Had hoped that it might get better sometime with new core versions. This seems not to have been the case so far.

Unfortunately there are no starting points for a solution at the moment.

hcs-svn commented 4 years ago

What I can confirm here:

But in the fritz mesh it looks as if it happens also when it connects to the FRITZ!Repeater 2400 or FRITZ! Repeater 1160

And it is really random. When simply doing resets again and again I sometimes have five in a row where it works, two failed, some OK, failed, failed, failed, failed, three OK, ...

This is my way how I always (OK, did only about 100 resets) get a connection:

WiFi.begin("ssid", "pass");
while (retryCounter < 20 && !WiFi.isConnected()) {
  retryCounter++;
  delay(100);
  Serial.print(".");
}
if (!WiFi.isConnected()) {
  Serial.println("\nAgain");
  WiFi.begin("ssid", "pass");
  retryCounter = 0;
  while (retryCounter < 20 && !WiFi.isConnected()) {
    retryCounter++;
    delay(100);
    Serial.print(".");
  }
}

And if the first loop gets a connections there are normally between 2 and 4 dots - means between 200 and 400 ms

pass is 25 characters long

kugelkopf123 commented 4 years ago

How about setting the ip, gateway, subnet, dns to static on the ESP32? Does this change something? Or did anyone of you testet how it behave if there is no wpa security active?

ullix commented 4 years ago

I did switching the security off right here: https://github.com/espressif/arduino-esp32/issues/2501#issuecomment-635255576

Works flawlessly even on the bad router.

Authentication gives trouble.

kugelkopf123 commented 4 years ago

I did switching the security off right here: #2501 (comment)

Works flawlessly even on the bad router.

Authentication gives trouble.

Oh, I must have missed that.

Maybe problems with special characters in the password?

ullix commented 4 years ago

digits only in password

tablatronix commented 4 years ago

The only time I ever had a double or random working not working is the esp8266 setMode race condition I found, wifi set opmode is asynchronous.

I do not know if this problem exists in esp32, afaik it has no fix for it so maybe not.

If you want to test edit the wifigeneric.cpp source inside mode() and add a loop to wait for mode to be set, atm it probably just checks the return value and assumes true = we all good!, when in fact it is not ( or was in not esp8266 )

I cannot reproduce so I cannot work on this issue atm. ill try to boot up some of the various routers I have and see if I can get one to do the same.

Let me know if you want to try this out and are capable of coding, I can link the esp8266 issue so you can see how I was testing and working around it

ullix commented 4 years ago

Tried the static thingy. Still needs double-hitter! But it goes a lot faster.

Before I show it, pay attention to a little gotcha: the Arduino reference page on the WiFi.config() topic gives wrong advice on the order of parameters (at least for the ESP32). This is the correct order taken from the lib in WiFiSTA.h:

        bool config(IPAddress local_ip, IPAddress gateway, IPAddress subnet, IPAddress dns1 = (uint32_t)0x00000000, IPAddress dns2 = (uint32_t)0x00000000);

this is the pseudo code with debug output; code not shown is same as here: https://github.com/espressif/arduino-esp32/issues/2501#issuecomment-634670194

            IPAddress local_IP      (10,0,0,99);
            IPAddress gateway       (10,0,0,1);
            IPAddress subnet        (255,255,255,0);
            IPAddress primaryDNS    (8, 8, 8, 8);
            IPAddress secondaryDNS  (8, 8, 4, 4);
            WiFi.config(local_IP, gateway, subnet, primaryDNS, secondaryDNS);

        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 0 - WIFI_READY
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 2 - STA_START
            WiFi.begin(SSID: fb7490, pw: *****)
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 5 - STA_DISCONNECTED
        [W][WiFiGeneric.cpp:353] _eventCallback(): Reason: 202 - AUTH_FAIL
            Check[ms:Status]: 0:6 100:6 200:4 Failed

            WiFi.begin(SSID: fb7490, pw: *****)
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 7 - STA_GOT_IP
        [D][WiFiGeneric.cpp:381] _eventCallback(): STA IP: 10.0.0.99, MASK: 255.255.255.0, GW: 10.0.0.1
        [D][WiFiGeneric.cpp:337] _eventCallback(): Event: 7 - STA_GOT_IP
        [D][WiFiGeneric.cpp:381] _eventCallback(): STA IP: 10.0.0.99, MASK: 255.255.255.0, GW: 10.0.0.1
            Check[ms:Status]: 0:4 100:4 200:4 300:3 Connected

Note two things:

ullix commented 4 years ago

@tablatronix I'm not understanding; could you elaborate?

tablatronix commented 4 years ago

The speed hit of using dhcp is well known, it is widely recommended to use static for battery powered for this reason.

Ullix if you don’t understand I cannot explain it, you have to be familiar with Esp sdk code

ullix commented 4 years ago

@tablatronix in this case I hope you find a proper bad router among your treasures to work with!

I still wouldn't mind if you posted the link to the ESP8266 issue, that you mentioned.

tablatronix commented 4 years ago

https://github.com/esp8266/Arduino/issues/4372

Mr-Bubbles commented 4 years ago

I have stumbled across the same Problem a couple of days ago and found a workaround on a German forum that works quite well for me:

WiFi.begin(_ssid, _pwd);
uint8_t tryCount=0;
while (WiFi.status() != WL_CONNECTED && tryCount <= 4) {
    delay(500);
    tryCount++;
    if (tryCount >= 4 && WiFi.status() == WL_CONNECT_FAILED) {
        WiFi.begin(_ssid, _pwd);        // ESP32-workaround (otherwise WiFi-connection sometimes fails)
    }
}

I have a FRITZ!Box 6490 with FritzOS 07.12

systembolaget commented 4 years ago

Probably the same thing here (Fritzbox 7590).

When uploading via USB, no WLAN and thus no Internet connection is established. Only when pressing the microcontroller's reset button (not the reset button on the ESP32 module) or uploading a second time, I arrive at case 4.


void connectToWLANAndMQTT()
{
  if ((WiFi.status() != WL_CONNECTED) && (stateConnection != 1))
  {
    stateConnection = 0;
  }
  if ((WiFi.status() == WL_CONNECTED) && (mqtt.connected() != 0) && (stateConnection != 3))
  {
    stateConnection = 2;
  }
  if ((WiFi.status() == WL_CONNECTED) && (mqtt.connected() == 0) && (stateConnection != 5))
  {
    stateConnection = 4;
  }

  switch (stateConnection)
  {
    case 0:
      if (millis() - timeNowWLAN >= intervalWLAN)
      {
        timeNowWLAN = millis();
        Serial.println("(Re)start WLAN connection");
        WiFi.disconnect();
        WiFi.begin(WLAN_SSID, WLAN_PASS);
        printWLANStatus();
        stateConnection = 1;
      }
      break;

    case 1:
      Serial.println("Wait for WLAN connection");
      break;

    case 2:
        Serial.println("WLAN connected. Start MQTT connection");
        mqtt.connect();
        stateConnection = 3;
      break;

    case 3:
      Serial.println("WLAN connected. Wait for MQTT connection");
      break;

    case 4:
      WiFi.setLEDs(255, 0, 0); // Green
      Serial.println("WLAN and MQTT connected");
      stateConnection = 5;
      break;
  }
}
tophat17 commented 4 years ago

I can confirm that putting the WiFi.begin(...) inside the block bounded by while (WiFi.status() != WL_CONNECTED) works for me. Setup is similar to what the OP described with a Fritz!Box on the other end and hard-coded credentials.

Worked for me too. I used to do ESP.restart(); and connect again to solve this problem, but that solution is way better.

Waiting for an "official" fix...

Hardware: DOIT ESP32 DEVKIT V1 80MHz

This worked for me!!! solved my issue.

lbernstone commented 4 years ago

I have never had this issue, but here is the "official" method of waiting for a connection:

WiFi.begin("ssid", "passwd");
if (WiFi.waitForConnectResult() != WL_CONNECTED) {
  log_e("Unable to connect to WiFi");
}
tablatronix commented 4 years ago

@lbernstone this times out everytime and errors with wrong password, there is clearly something wrong with the handshake with fritzbox routers. Requires a second connection

ullix commented 4 years ago

It seems that all workarounds use some variant of the double-hitter scheme.

I went through this issue and noted all the good and bad routers mentioned. The Fritz-Boxes in particular feature very poorly. I verified the dual/single band specs. ("Android" was my smartphone configured as WiFi router) :

    Bad Router/Repeater
    FritzBox6490          (dual band) (2013)
    FritzBox7490          (dual band) (2013)
    FritzBox7590          (dual band) (2017)
    FritzRepeater2400     (dual band) (?)
    FritzRepeater1160     (dual band) (?)
    Netgear WN2000RPTv3   (single band) (2008? 2010?) Repeater

    Good Router
    FritzBox7050          (single band) (2005)
    FritzBox7272          (single band) (2013)
    FritzBox7312          (single band) (2012)
    FritzBox7360          (single band) (2011 ?)
    O2 DSL Router Comfort (single band) (~2008, made by ZyXEL, model P-2602-HW-D7A)
    "old 2.4GHz d-link"   (single band) (?)
    Android 7.1.1         (dual band)   (2016)
    Android 6.0.1         (dual band)   (2013, device: Nexus 7 (2013))

This table seems to suggest that "dual band" does play a role. The only exceptions are the Android Smartphone and Tablet. Does that ring a bell to someone?

Feel free to add your router(s) to the list!

ullix commented 4 years ago

Just got my hand at an even older router, a FritzBox 7050. As a device from year 2005 it is a single band router. And, sure enough, it is a good router; the ESP32 connects on first trial in less than 200ms.

And another one: FritzBox 7312 (from year 2012). A rather small and simple device. Also a single band, and, yes, connects in first hit! Though it took at least 1500ms for a connection, tested in some 10 attempts.

And one more, an O2 DSL Router Comfort (made by ZyXEL, ~2008). Single band router, connects fast on first try within <200ms.

I added all to the table in comment https://github.com/espressif/arduino-esp32/issues/2501#issuecomment-643733428

This completes the trash collection from the neighborhood ;-) ...

Added tablet Nexus7 (2013) with Android 6.0.1 configured as router; also a Good one! But making a connection takes >3sec; but works always on 1st try.

A nice overview of many (all?) FritzBox, Speedport, and other routers is here: https://www.router-faq.de/

systembolaget commented 4 years ago

Thanks, ullix, that's great information. I shouldn't have bought a 7590 then. Too bad not many have pointed out that it's not the coding scheme, but definitely the router model that can throw a spanner in the works.

I also had sudden drop-outs once per day or every few weeks, troubleshooting in code like an idiot, only to find out it was a neighbour's TP-Link powerline adapter used to extend their WLAN coverage. TP-Link unplugged, no more disconnects.

Mr-Bubbles commented 4 years ago

@ullix do you by any chance have the Fritz!OS Version of the Routers? Maybe dual band is not the problem, but just Indikation of the problem which maybe was introduced with a newer Firmware version.

Since the ESP32 is throwing an AUTH_FAILED error, maybe there is an incompatibility with the used WPA2 TKIP Encryption of the newer routers?

Has anybody tried changing the WPA2 encryption Standart? Is that even possible with ne new fritz!OS? I will try that later.

@Edit: one can chance between WPA2 TKIP and WPA2 CCMP, since CCMP is the new standard maybe the there is a incompatibility.

ullix commented 4 years ago

@Mr-Bubbles I don't have the Fritz!Os versions

I would caution of interpreting the table as a single- vs dual-band problem per se. Because the dual-band ones are the younger ones, and somewhere along the time axis some code may have been changed.

However, this also is not strictly so, as the FB7272 and the FB7490 were both first released in the same year, 2013, and one is good and one is bad.

I have no clue what CCMP vs TKIP is, but this might be the type of modification that goes along both timeline and modernization of routers. @Mr-Bubbles can you provide more details and/or a link?

Many of you will have younger and older Android versions; can you give it a try and report?

ullix commented 4 years ago

Following up on the CCMP and TKIP idea (spoiler: no success):

the FritzBox 7490 has 4 modes of wifi access: 1 unsecure mode WEP, and 3 secure WPA modes.

The WEP mode works always on 1st try; no surprise, as the failure is in AUTH, and WEP has none of it.

The 3 WPA modes are this; image and in all of them I need a double-hitter to connect.

But there is one interesting aspect: whenever I change from one WPA mode to another, the next connection attempt always works on first hit, irrespective which mode was selected; it is the change which matters!

All of the subsequent connection attempts then always need the double-hitter.

systembolaget commented 4 years ago

Thanks for all that testing!

Reliable ESP32 WiFi network and then Internet re-connection is excruciatingly painful.

Just for curiosity, I took a prehistoric Nokia and a new Samsung, made them WPA2 hotspots, and could reconnect instantly after removing power from my set-up and then reconnecting power again. No USB connection. No reset button on the ESP32 board necessary. It just works.

systembolaget commented 4 years ago

Also disabling the WLAN on the two smartphones and re-enabling it has my set-up reconnecting automatically as it should.

Out go the f*^"ing Fritzboxes. What WLAN router brand do you suggest one should buy?

Mr-Bubbles commented 4 years ago

@ullix some more Information about the ins and outs of WPA2 encryption can be found in the English Wikipedia Page IEEE_802.11i-2004 (What I didn't now was, that CCMP is mandatory for WPA2 encryption since over 15 years now and since it has been around for so long it might not be the culprit after all)

However I also tested around a bit more an came to the same result as you. That is, even with WPA1 and TKIP (no longer considered secure) the ESP32 needs 2 retries to successfully connect to my Fritz!Box.

I then went along and tried the Wifi example code from Espressifs own ESP-IDF Framework. (can be found here ESP-IDF WIFI Station Example ), because i thought maybe the Adruino Framework messed something up, but even that code needs a second retry to successfully connect. Since the example Code generates a lot of debug output I will post it here, so maybe it will help someone:

0;32mI (543) wifi station: ESP_WIFI_MODE_STA␛[0m
I (573) wifi:wifi driver task: 3ffc2114, prio:23, stack:6656, core=0
␛[0;32mI (573) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE␛[0m
␛[0;32mI (573) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE␛[0m
I (603) wifi:wifi firmware version: aa5336b
I (603) wifi:config NVS flash: enabled
I (603) wifi:config nano formating: disabled
I (603) wifi:Init dynamic tx buffer num: 32
I (603) wifi:Init data frame dynamic rx buffer num: 32
I (613) wifi:Init management frame dynamic rx buffer num: 32
I (613) wifi:Init management short buffer num: 32
I (623) wifi:Init static rx buffer size: 1600
I (623) wifi:Init static rx buffer num: 10
I (633) wifi:Init dynamic rx buffer num: 32
␛[0;32mI (733) phy: phy_version: 4180, cb3948e, Sep 12 2019, 16:39:13, 0, 0␛[0m
I (733) wifi:mode : sta (30:ae:a4:19:d7:6c)
␛[0;32mI (733) wifi station: wifi_init_sta finished.␛[0m
␛[0;32mI (733) wifi station: connect to ap SSID:XXXX password:XXXX␛[0m
I (853) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (853) wifi:state: init -> auth (b0)
I (863) wifi:state: auth -> init (8a0)
I (863) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
␛[0;32mI (863) wifi station: retry to connect to the AP␛[0m
␛[0;32mI (873) wifi station: connect to the AP fail
␛[0m
␛[0;32mI (2913) wifi station: retry to connect to the AP␛[0m
␛[0;32mI (2913) wifi station: connect to the AP fail
␛[0m
I (3033) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (3033) wifi:state: init -> auth (b0)
I (3043) wifi:state: auth -> assoc (0)
I (3043) wifi:state: assoc -> run (10)
I (3093) wifi:connected with XXXX, aid = 1, channel 1, BW20, bssid = e0:28:6d:57:c3:8a
I (3093) wifi:security: WPA-PSK, phy: bg, rssi: -53
I (3103) wifi:pm start, type: 1

I (3173) wifi:AP's beacon interval = 102400 us, DTIM period = 1
␛[0;32mI (4043) tcpip_adapter: sta ip: 192.168.42.64, mask: 255.255.255.0, gw: 192.168.42.254␛[0m
␛[0;32mI (4043) wifi station: got ip:192.168.42.64␛[0m

So my conclusion is, the bug might be somewhere deep inside the WiFI encryption library of the ESP32 and it only surfaces against a Fritz!Box because they seem to be especially picky when it comes to following the standard, but debugging that is way out of my league.

@systembolaget: I believe the ESP library at fault, not the Fritz!Box (however, that is just my guts, no hard evidence there.) And I refuse to throw out a 200€ Router, that is the quasi standard when it comes to stability, security, updates and support. There simply is no better home router you can by for your money. However if you really want to change you router, either the TP-Link Archer VR2800v (also includes DECT and support for analog telephones) or the Turris Omnia (runs its own OpenWRT Version and has quite a lot of external interfaces such as mSATA or an SFP+ Port, but needs an external Modem) seem to be worth a look.

systembolaget commented 4 years ago

@Mr-Bubbles Yeah, well, I have lots of Adafruit AirLift FeatherWings for a forest/lake monitoring project; they shall receive remote data via RFM69HCW packet radios, and for "WiFi-huts", to finally get the data squeezed into the Internet, I bought Fritzboxes in Germany, as they were recommended. I need something reliable and refuse to throw out all ESP32 devices. I think it is a better solution to buy what you suggest, TP-Link Archer VR2800v, which costs 3278kr apiece. In the worst case, I pepper the forest with old Nokias, lol.

I'm just so very chuffed that I finally know what the culprit is, as I fiddled with code for weeks.

Mr-Bubbles commented 4 years ago

@systembolaget Ok, that's understandable. Personally I have had good experience with other TP-Link Routers especially when it comes to value for money . If you don't need all the DECT and telefone stuff there should be even cheaper TP-Link routers around that can do everything you need for even less money. (Maybe look for a model that is compatible with the OpenWRT Firmware which will convert the router into a micro Linux Server with almost endless possibilitys)

Regarding the Fritz!Box problem: I opened a support ticked with the manufacturer AVM. Lets see if they are willing to dig into this problem or at least can shed some light onto what is going wrong. Maybe i have more luck than @ullix. I even linked this BUG report.

dpharris commented 4 years ago

Why do these reports differ from first attempt?

I (3033) wifi:state: init -> auth (b0)

I (3043) wifi:state: auth -> assoc (0)

I (3043) wifi:state: assoc -> run (10)

The first attempt doesn't include that last line.

David

On Sun., Jun. 14, 2020, 8:42 a.m. Mr-Bubbles, notifications@github.com wrote:

@ullix https://github.com/ullix some more Information about the ins and outs of WPA2 encryption can be found in the English Wikipedia Page IEEE_802.11i-2004 https://en.wikipedia.org/wiki/IEEE_802.11i-2004 (What I didn't now was, that CCMP is mandatory for WPA2 encryption since over 15 years now and since it has been around for so long it might not be the culprit after all)

However I also tested around a bit more an came to the same result as you. That is, even with WPA1 and TKIP (no longer considered secure) the ESP32 needs 2 retries to successfully connect to my Fritz!Box.

I then went along and tried the Wifi example code from Espressifs own ESP-IDF Framework. (can be found here ESP-IDF WIFI Station Example https://github.com/espressif/esp-idf/blob/v3.2.2/examples/wifi/getting_started/station/main/station_example_main.c ), because i thought maybe the Adruino Framework messed something up, but even that code needs a second retry to successfully connect. Since the example Code generates a lot of debug output I will post it here, so maybe it will help someone:

0;32mI (543) wifi station: ESP_WIFI_MODE_STA␛[0m

I (573) wifi:wifi driver task: 3ffc2114, prio:23, stack:6656, core=0

␛[0;32mI (573) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE␛[0m

␛[0;32mI (573) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE␛[0m

I (603) wifi:wifi firmware version: aa5336b

I (603) wifi:config NVS flash: enabled

I (603) wifi:config nano formating: disabled

I (603) wifi:Init dynamic tx buffer num: 32

I (603) wifi:Init data frame dynamic rx buffer num: 32

I (613) wifi:Init management frame dynamic rx buffer num: 32

I (613) wifi:Init management short buffer num: 32

I (623) wifi:Init static rx buffer size: 1600

I (623) wifi:Init static rx buffer num: 10

I (633) wifi:Init dynamic rx buffer num: 32

␛[0;32mI (733) phy: phy_version: 4180, cb3948e, Sep 12 2019, 16:39:13, 0, 0␛[0m

I (733) wifi:mode : sta (30:ae:a4:19:d7:6c)

␛[0;32mI (733) wifi station: wifi_init_sta finished.␛[0m

␛[0;32mI (733) wifi station: connect to ap SSID:XXXX password:XXXX␛[0m

I (853) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1

I (853) wifi:state: init -> auth (b0)

I (863) wifi:state: auth -> init (8a0)

I (863) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1

␛[0;32mI (863) wifi station: retry to connect to the AP␛[0m

␛[0;32mI (873) wifi station: connect to the AP fail

␛[0m

␛[0;32mI (2913) wifi station: retry to connect to the AP␛[0m

␛[0;32mI (2913) wifi station: connect to the AP fail

␛[0m

I (3033) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1

I (3033) wifi:state: init -> auth (b0)

I (3043) wifi:state: auth -> assoc (0)

I (3043) wifi:state: assoc -> run (10)

I (3093) wifi:connected with XXXX, aid = 1, channel 1, BW20, bssid = e0:28:6d:57:c3:8a

I (3093) wifi:security: WPA-PSK, phy: bg, rssi: -53

I (3103) wifi:pm start, type: 1

I (3173) wifi:AP's beacon interval = 102400 us, DTIM period = 1

␛[0;32mI (4043) tcpip_adapter: sta ip: 192.168.42.64, mask: 255.255.255.0, gw: 192.168.42.254␛[0m

␛[0;32mI (4043) wifi station: got ip:192.168.42.64␛[0m

So my conclusion is, the bug might be somewhere deep inside the WiFI encryption library of the ESP32 and it only surfaces against a Fritz!Box because they seem to be especially picky when it comes to following the standard, but debugging that is way out of my league.

@systembolaget https://github.com/systembolaget: I believe the ESP library at fault, not the Fritz!Box (however, that is just my guts, no hard evidence there.) And I refuse the throw out a 200€ Router, that is the quasi standard when it comes to stability, security, updates and support. There simply is no better home router you can by for your money. However if you really want to change you router, either the TP-Link Archer VR2800v (also includes DECT and support for analog telephones) or the Turris Omnia (runs its own OpenWRT Version and has quite a lot of external interfaces such as mSATA or an SFP+ Port, but needs an external Modem) seem to be worth a look.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/espressif/arduino-esp32/issues/2501#issuecomment-643783876, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEDQSQMLW3MPVV72U54NV3RWTVVDANCNFSM4GZZ3XNA .

kugelkopf123 commented 4 years ago

Why do these reports differ from first attempt? I (3033) wifi:state: init -> auth (b0) I (3043) wifi:state: auth -> assoc (0) I (3043) wifi:state: assoc -> run (10) The first attempt doesn't include that last line. David

Yes you are right. Its a bit strange that init starts auth and the auth starts init:

I (853) wifi:state: init -> auth (b0)
I (863) wifi:state: auth -> init (8a0)

on second try there is no init anymore. Perhaps there is something wrong with the order.

Same with the debug message: Retry to connect to the APand then connection to AP failed. it should be the other way around.

Mr-Bubbles commented 4 years ago

EDIT: Not a solution, see posts further down OK people I MAY have found the culprit and a solution:

After activating PMF (Protected Management Frames) on my FritzBox (WLAN -> Security -> Encryption) the ESP32 no longer needs a second try to connect to my WLAN but manages to connect on every attempt. Deactivated PMF again on the FritzBox (this is the default) and the problem was back again.

On the AVM Homepage it states, that PMF is a new security feature to mitigate connection hijacking by evil APS, but since old clients (pre 2009) might not support it, it is off by default. (For further info IEEE 802.11w-2009 )

@kugelkopf123, @dpharris: After activating PMF the debug protocol now looks like this, and the strange init -> auth / auth -> init seems to be gone

␛[0;32mI (543) wifi station: ESP_WIFI_MODE_STA␛[0m
I (573) wifi:wifi driver task: 3ffc2114, prio:23, stack:6656, core=0
␛[0;32mI (573) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE␛[0m
␛[0;32mI (573) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE␛[0m
I (603) wifi:wifi firmware version: aa5336b
I (603) wifi:config NVS flash: enabled
I (603) wifi:config nano formating: disabled
I (603) wifi:Init dynamic tx buffer num: 32
I (603) wifi:Init data frame dynamic rx buffer num: 32
I (613) wifi:Init management frame dynamic rx buffer num: 32
I (613) wifi:Init management short buffer num: 32
I (623) wifi:Init static rx buffer size: 1600
I (623) wifi:Init static rx buffer num: 10
I (633) wifi:Init dynamic rx buffer num: 32
␛[0;32mI (723) phy: phy_version: 4180, cb3948e, Sep 12 2019, 16:39:13, 0, 0␛[0m
I (733) wifi:mode : sta (30:ae:a4:19:d7:6c)
␛[0;32mI (733) wifi station: wifi_init_sta finished.␛[0m
␛[0;32mI (733) wifi station: connect to ap SSID:XXXX password:XXXX␛[0m
I (853) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (853) wifi:state: init -> auth (b0)
I (863) wifi:state: auth -> assoc (0)
I (873) wifi:state: assoc -> run (10)
I (913) wifi:connected with XXXX, aid = 5, channel 1, BW20, bssid = e0:28:6d:57:c3:8a
I (913) wifi:security: WPA-PSK, phy: bg, rssi: -47
I (913) wifi:pm start, type: 1
I (973) wifi:AP's beacon interval = 102400 us, DTIM period = 1
␛[0;32mI (1543) tcpip_adapter: sta ip: 192.168.42.64, mask: 255.255.255.0, gw: 192.168.42.254␛[0m
␛[0;32mI (1543) wifi station: got ip:192.168.42.64␛[0m

So contrary to my guts, the FritzBox is to blame! (Shame on you FitzBox!)

hcs-svn commented 4 years ago

Great. With the right keyword found this:

ESP32 supports the following three modes of operation with respect to PMF.

PMF not supported: In this mode, ESP32 indicates to AP that it is not capable of supporting management protection during association. In effect, security in this mode will be equivalent to that in traditional mode.

PMF capable, but not required: In this mode, ESP32 indicates to AP that it is capable of supporting PMF. The management protection will be used if AP mandates PMF or is at least capable of supporting PMF.

PMF capable and required: In this mode, ESP32 will only connect to AP, if AP supports PMF. If not, ESP32 will refuse to connect to the AP.

esp_wifi_set_config() can be used to configure PMF mode by setting appropriate flags in pmf_cfg parameter. Currently, PMF is supported only in Station mode.

Can be found here: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/wifi.html

Mr-Bubbles commented 4 years ago

Good find. But it looks like the Arduino-esp* Framework does not seem to expose those low level parameters, so if you are using the Arduino library's one can not disable or force the use of PMF? (At least i don't know how to.)

However, enabling PMF on my FriztBox is a good enough solution for me, since it also rises the security level and is the IEEE standard.

@systembolaget maybe you don't need new routers after all.

systembolaget commented 4 years ago

@Mr-Bubbles Yeah, I contacted the German Fritzbox supplier and they said it is a problem with the ESP32, which is a non-standard device outside of China, lol.

I ordered the routers you recommended and hope I have them arriving super quick.

I meanwhile tried with a prehistoric Nokia N9, Nokia Lumia 920, iPhone 4 and Sony Xperia X10 as WiFi hotspots (I have a large collection from working in the field of GSM), and my ESP32 set-up (Adafruit AirLift FeatherWing on an Adafruit Metro Mini) reconnects instantly 1. when I remove power and 2. reboot the WLAN spawning device with the basic state machine WLAN and MQTT code. After trying many smartphone hotspots with WPA2, I conclude that the Fritzbox router must be the culprit.

kugelkopf123 commented 4 years ago

@Mr-Bubbles I would not say directly that it is a bug from the Fritzboxes but rather a compatibility problem. Probably the ESP32 expects PMF by default, but doesn't get it from the FB because it is trimmed for compatibility (default).

It would be really great if someone could make this available in Arduino as well. Unfortunately my knowledge is not enough.

Nevertheless a super find! Thanks!