espressif / ESP8266_NONOS_SDK

ESP8266 nonOS SDK
Other
925 stars 536 forks source link

[TW#20353] ESP8266 NON_OS_SDK 2.2.0 WIFI connecting issue #112

Open crgor opened 6 years ago

crgor commented 6 years ago

Hi Everyone,

I am testing the new NON_OS_DSK 2.2.0, it seems to be having issues with the WIFI station connection. The below code works fine in 2.1.0 (Except that threshold structure rssi and AUTH mode setup), but in 2.2.0 the WIFI keeps on connecting .. but never does.

`

  struct station_config stconfig;

  os_memset(&stconfig, 0x0, sizeof(stconfig));

  os_memset(stconfig.ssid, 0, sizeof(stconfig.ssid));
  stconfig.threshold.authmode = AUTH_WPA_WPA2_PSK;
  stconfig.threshold.rssi = 84;

  os_printf("connecting to %s with %s rssi %d auth %d \n", sysCfg.wifi_ssid, wifi_dec_pass, stconfig.threshold.rssi, stconfig.threshold.authmode);

  os_memset(stconfig.password, 0, sizeof(stconfig.password));
  os_sprintf(stconfig.ssid, "%s", (char *) sysCfg.wifi_ssid);
  os_memcpy(stconfig.password, wifi_dec_pass,
              os_strlen(wifi_dec_pass));
  //wifi_set_opmode(STATION_MODE);
  if (iWifiMode == STATION_MODE){
     wifi_set_opmode((wifi_get_opmode()|STATIONAP_MODE) & iWifiMode);
  }else {
     wifi_set_opmode((wifi_get_opmode()|STATIONAP_MODE) & sysCfg.wifi_mode);
  }

  if ((sensCfg.szHostName != NULL) && (strlen(sensCfg.szHostName) > 0)){
     wifi_station_set_hostname(sensCfg.szHostName);
  }

  if (!wifi_station_set_config(&stconfig)) {
     WIFI_DEBUG("ESP8266 not set station config!\r\n");
  }
  if (!dhcp || strcmp(dhcp, "dhcp") != 0) {
     wifi_get_ip_info(STATION_IF, &info);
     ip = (char *) sysCfg.net_ip;
     mask = (char *) sysCfg.net_mask;
     gw = (char *) sysCfg.net_gw;
     if (ip)
        info.ip.addr = ipaddr_addr(ip);
     if (mask)
        info.netmask.addr = ipaddr_addr(mask);
     if (gw)
        info.gw.addr = ipaddr_addr(gw);
     if (wifi_station_dhcpc_status() == 1) {
        wifi_station_dhcpc_stop();
     }
     wifi_set_ip_info(STATION_IF, &info);
  } else {
     if (wifi_station_dhcpc_status() == 0) {
        wifi_station_dhcpc_start();
     }
  }
  wifi_station_disconnect();
  wifi_station_connect();

  wifi_station_set_reconnect_policy(TRUE);
  wifi_station_set_auto_connect(TRUE);

`

Sometimes the station connects, but a majority of the time it says connecting. The output is as follows: System init... 12345678 senSingP@ssw0rd admin connecting to IOTLab with 12345678 rssi 84 auth 4 mode : sta(60:01:94:26:5b:6a) add if0 scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 35 cnt heap 32680 WiFi connecting... state: 5 -> 2 (7c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 35 cnt heap 32680 WiFi connecting... state: 5 -> 2 (7c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 35 cnt state: 5 -> 2 (7c0) rm 0 heap 32680 WiFi connecting... reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 35 cnt state: 5 -> 2 (7c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 35 cnt heap 32680 WiFi connecting... state: 5 -> 2 (7c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 35 cnt heap 32680 WiFi connecting... state: 5 -> 2 (7c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 35 cnt heap 32680 WiFi connecting... state: 5 -> 2 (7c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 32 cnt state: 5 -> 2 (7c0) rm 0 heap 32680 WiFi connecting... reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 32 cnt`

@FayeY any idea what am I doing wrong?

marcelstoer commented 6 years ago

Several WiFi fixes were committed to the 2.2.x branch since the 2.2.0 release: https://github.com/espressif/ESP8266_NONOS_SDK/commits/release/v2.2.x Have you tried with the latest? Also, there seem to be issues with channel 13 and hidden SSID according to #109.

crgor commented 6 years ago

@marcelstoer Thanks for the update, oh I am not actively looking for the master branch. I will download the latest and try it out.

crgor commented 6 years ago

Tested with latest commits on NON_OS_SDK 2.2.x. The result is same, the output is as follows:

System init... A1R5EN5EPLU5 senSingP@ssw0rd admin connecting to SenSINGPL with A1R5EN5EPLU5 rssi 84 auth 4 mode : sta(60:01:94:26:5b:6a) add if0 scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting... state: 5 -> 2 (2c0) rm 0 reconnect state: 2 -> 0 (0) scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 6 cnt heap 32568 WiFi connecting...

@FayeY @wujiangang Could you guys shed some light on the State 0, 2,3,4,5 along with that 7c0, 2b0 hex codes. All I am testing is WPA/WPA2 PSK security AP with 2.4 GHz on different channels like 1, 6 etc.., WIFI connectivity is not stable most of the time chips are not connecting. Your support is deeply appreciated. Thank you.

crgor commented 6 years ago

after a bit more testing found the root cause of this problem, I am testing with the IOT_demo setup. I have written one function which does the setup_wifi during bootup and restart_10ms_cb which sets the Station config in user_webserver.c when someone changes the WIFI settings through the HTTP server.

When the system_restart() is called for both Station and SoftAP mode in restart_10ms_cb this issue happens, but when as per the IOT_demo only in SoftAP mode. The code works, doesn't understand how system_restart() is causing this problem just right after wifi_station_connect(); call.

setup_wifi function

` setup_wifi(uint8 iWifiMode) {

char *ip, *mask, *gw;
struct ip_info info;
char *dhcp = (char *) sysCfg.net_mode;
// To do get the host name from the ATMEGA
if( wifi_get_phy_mode() != PHY_MODE_11N )
{
   WIFI_DEBUG("*** Setting PHY_MODE ...\r\n");
   wifi_set_phy_mode( PHY_MODE_11N );
}
wifi_station_set_hostname(DEFHOSTNAME);

wifi_station_set_reconnect_policy(FALSE);
wifi_station_set_auto_connect(FALSE);

if ((iWifiMode == STATION_MODE) || (sysCfg.wifi_mode & STATION_MODE)) {
    // There are changes done changes here specifically to support SDK 2.3.0
    os_printf("Setting up the station \n");
    struct station_config stconfig;
    if (!wifi_station_get_config(&stconfig)){
        os_printf("unable to get STA Config \n");
    }
    os_memset(&stconfig, 0x0, sizeof(stconfig));
    os_memset(stconfig.ssid, 0, sizeof(stconfig.ssid));
    stconfig.threshold.authmode = AUTH_WPA_WPA2_PSK;
    stconfig.threshold.rssi = 84;
    os_printf("connecting to %s with %s rssi %d auth %d \n", sysCfg.wifi_ssid, wifi_dec_pass, stconfig.threshold.rssi, stconfig.threshold.authmode);
    os_memset(stconfig.password, 0, sizeof(stconfig.password));
    os_sprintf(stconfig.ssid, "%s", (char *) sysCfg.wifi_ssid);
    os_memcpy(stconfig.password, wifi_dec_pass,
                    os_strlen(wifi_dec_pass));
    //wifi_set_opmode(STATION_MODE);
    if (iWifiMode == STATION_MODE){
        wifi_set_opmode((wifi_get_opmode()|STATIONAP_MODE) & iWifiMode);
    }else {
        wifi_set_opmode((wifi_get_opmode()|STATIONAP_MODE) & sysCfg.wifi_mode);
    }

    if ((sensCfg.szHostName != NULL) && (strlen(sensCfg.szHostName) > 0)){
        wifi_station_set_hostname(sensCfg.szHostName);
    }

    if (!wifi_station_set_config(&stconfig)) {
        WIFI_DEBUG("ESP8266 not set station config!\r\n");
    }
    if (!dhcp || strcmp(dhcp, "dhcp") != 0) {
        wifi_get_ip_info(STATION_IF, &info);
        ip = (char *) sysCfg.net_ip;
        mask = (char *) sysCfg.net_mask;
        gw = (char *) sysCfg.net_gw;
        if (ip)
            info.ip.addr = ipaddr_addr(ip);
        if (mask)
            info.netmask.addr = ipaddr_addr(mask);
        if (gw)
            info.gw.addr = ipaddr_addr(gw);
        if (wifi_station_dhcpc_status() == 1) {
            wifi_station_dhcpc_stop();
        }
        wifi_set_ip_info(STATION_IF, &info);
    } else {
        if (wifi_station_dhcpc_status() == 0) {
            wifi_station_dhcpc_start();
        }
    }
    wifi_station_disconnect();
    wifi_station_connect();

    wifi_station_set_reconnect_policy(TRUE);
    wifi_station_set_auto_connect(TRUE);

    // Start ping Gateway timer
    user_disable_pingw(1);
}

}`

restart_10ms_cb ` restart_10ms_cb(void *arg) {

if (rstparm != NULL && rstparm->pespconn != NULL) {
    switch (rstparm->parmtype) {

        case WIFI:
            //if (rstparm->pespconn->state == ESPCONN_CLOSE) {
                if (sta_conf->ssid[0] != 0x00) {

        wifi_set_opmode(STATION_MODE);
                    wifi_station_set_config(sta_conf);
                    char *dhcp = (char *) sysCfg.net_mode;
                    char *ip, *mask, *gw;
                    struct ip_info info;
        if (!dhcp || strcmp(dhcp, NETMODE_DHCP_STR) != 0) {
            wifi_get_ip_info(STATION_IF, &info);
            ip = (char *) sysCfg.net_ip;
            mask = (char *) sysCfg.net_mask;
            gw = (char *) sysCfg.net_gw;
            if (ip)
                    info.ip.addr = ipaddr_addr(ip);
            if (mask)
                info.netmask.addr = ipaddr_addr(mask);
            if (gw)
                info.gw.addr = ipaddr_addr(gw);
            if (wifi_station_dhcpc_status() == 1) {
                wifi_station_dhcpc_stop();
            }
            wifi_set_ip_info(STATION_IF, &info);
        } else {
            if (wifi_station_dhcpc_status() == 0) {
                wifi_station_dhcpc_start();
            }
        }
        wifi_station_set_reconnect_policy(TRUE);
        wifi_station_set_auto_connect(TRUE);
                    wifi_station_disconnect();
                    wifi_station_connect();
                }

                if (ap_conf->ssid[0] != 0x00) {
                    wifi_softap_set_config(ap_conf);
                }

                system_restart();

                os_free(ap_conf);
                ap_conf = NULL;
                os_free(sta_conf);
                sta_conf = NULL;
                os_free(rstparm);
                rstparm = NULL;
                os_free(restart_10ms);
                restart_10ms = NULL;
            //} else {
            //   os_timer_arm(restart_10ms, 10, 0);
            //}

            break;

}`

wujiangang commented 6 years ago

hi, @crgor sorry for the inconvenience. first, i will explain the state: 0: init 2: auth 3: assoc 5: run

state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) means that wifi auth and assoc success, then wifi will enter 4-way handshake stage. 7c0 or 2c0: just notice c0, this means that router send deauth to us.

some tips:

  1. can you check the time between state: 3->5 to state: 5->2?
  2. can you use an open router to test this? so i can check whether this is cause by the 4-way handshake.
  3. can you share a simple test code to us? so we can test directly, and avoid the difference of our code.

@FayeY Please help to follow up this issue.

Thanks

damonsong commented 6 years ago

@wujiangang We met the bug as follows: Repeat method:

  1. 8266 OS: NonOS
  2. Connecting 8266 to router with Internet connection;
  3. Disconnect Internet connection by plug-out the ethernet cable, and keeping the 8266 connected to the router;
  4. Waiting for seconds, 8266 reports two errors: (1) group key update timeout; (same as 4-way handshake timeout of 802.11 standard?) (2) STATION_WRONG_PASSWORD;
  5. Reset 8266 by power off completely and power on, the 8266 still can't connected to router with same error message.
  6. However, the other devices like smart phone or IoT device (smart plug) can connect to the same router by checking the connected device in web administration of the router.

-----Log start-------- System started ... mode : sta(68:c6:3a:b3:ab:d6) add if0 cmd 81, len 5, devType 0a devSoftVer v03.05 devPCBVer v01.00 cmd 86, len 9, sw 01 bright 64 CCT 00 C=00 W=64 Y=00 STATION_IDLE ,Wifi auth mode: 0 -> 3 STATION_IDLE STATION_IDLE STATION_IDLE scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 2 cnt STATION_IDLE STATION_IDLE clear STATION_IDLE STATION_IDLE state: 5 -> 2 (fc0) rm 0 Wifi disconnected from ssid qmd, reason group_key_update_timeout (15) STATION_WRONG_PASSWORD pwd err 0 scandone state: 2 -> 2 (b0) --------------------Log end-------------------

  1. How to add the timer between state changing or mearsure the time?3->5 to 5->2(assoc to run)to (run to auth). It seems the log info of "state" is printed by low level drivers.
  2. In 2nd tips, your mentioned “open router”, it stands for an open source router (DDWRT or OpenWRT router), right?
  3. Is it possible to increase the timeout interval of auth time? The original reason code is 15 (4-way handshake stage)?This code is the same one as “Group key update timeout”?
  4. Accompany with group_key_update_timeout error, there is "STATION_WRONG_PASSWORD" prompt.Do you think the “Wrong password” causes the auth failed? Or there is no connection or relationship between “Wrong Password error” and “group key update timeout”?
  5. Is there any API to manipulate WPA modules in low level? We’re going to try to do reassociation or to disassociation services from station. Or active deauthentication from 8266 to router. Is it possible?
daviddpd commented 5 years ago

I've been having this same issue ... will try rolling back to 2.1 ... would LOVE to see more of the lower level code opened up so the community can help debug this.

[DEBUG][espChatFabric/esp-cf-wifi.c:espWiFiInit:156] WiFi Init
[DEBUG][espChatFabric/esp-cf-wifi.c:espWiFiInit:183] WiFi Init - Mode Set
[DEBUG][espChatFabric/esp-cf-wifi.c:espCfWiFi_callBack:145] UNKNOWN EVENT:  08
[DEBUG][espChatFabric/esp-cf-wifi.c:espCfWiFi_callBack:101] mode: 0 -> 3
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 2
cnt 
state: 5 -> 2 (2c0)
rm 0
[DEBUG][espChatFabric/esp-cf-wifi.c:espCfWiFi_callBack:94] disconnect from ssid demo, reason 2
reconnect
daviddpd commented 5 years ago

damnit. I think I've seen this before.

My guess, is the the struct station_config gets laid out slightly differently in the param flash differently between revs of the SDK versions. The resulting password strings is then not becoming correctly a NULL terminated string. So, moving between SDK versions may leave some cruft in the flash (this is really bad for OTA upgrades) - which must be passing the wrong password to the access point.

I re-created the problem intermediately with my ESP12-E/F and a Sonoff Basic when reverting to SDK 2.0.0. I got around it once by changing the access point's password to something longer.

So ... I thought clearing out of all flash would help ...

dd if=/dev/zero of=1MBblank.bin bs=4k count=256
dd if=/dev/zero of=4MBblank.bin bs=4k count=1024

Then depending on your flash size - zero the entire flash ...

/usr/local/bin/esptool.py --chip esp8266 --no-stub --port /dev/$SERIAL_PORT write_flash --flash_size $FLASH_SIZE 0x00000 ${FLASH_SIZE}blank.bin  --no-compress 

And it did work ... once. Both in 2.0.0 and 2.2.1 , any time I do -

 /usr/local/bin/esptool.py --port /dev/cu.usbserial-DJ00A0O5 --no-stub write_flash --flash_size 4MB --no-compress 0x3FB000 ./16kblank.bin

/usr/local/bin/esptool.py --port /dev/cu.usbserial-DJ00A0O5 --no-stub write_flash --flash_size 4MB --no-compress 0x3FC000 /Volumes/esp8266/esp-open-sdk/sdk/bin/esp_init_data_default.bin

WiFi works. However, once the 8266 reboots, it then get "disconnect from ssid demo, reason 2" ... which is bad password. So, I can only assume, that some sort corruption happening in memory or sys param ... probably related to RF_CAL ... which is black box magic right now.

FayeY commented 5 years ago

So sorry for the inconvenience. Would you like to use our latest ESP8266 RTOS SDK for further development? It refactored to be ESP-IDF style, which makes application able to run on both ESP32 and ESP8266. Users can porting ESP32 application to ESP8266, or porting from ESP8266 to ESP32 easily. It will be the SDK we focus on in the future.

daviddpd commented 5 years ago

So sorry for the inconvenience. Would you like to use our latest ESP8266 RTOS SDK for further development?

Not really. When I checked last, it was not compatible with the nonOS API, so I'd have to port/redevelop 2-3 years work of code.

(Update: I believe the issue I'm having is I was using the 3rd party rboot (https://github.com/raburton/rboot). Reverting back to sdk/bin/boot_* seems to have fixed the issue. )

At some point, the benefits of having portable code between 8266 and esp32 would be worth it, but would need 12-18 months to work that in at this point. But if ultimately required - I would want to do it AFTER the 8266 and 32 are merged. I'd rather not deal with a possible double port/conversation/update.

Not sure what the rest of the community uses - not sure what the install base uses - but would think it would be nearly impossible to do an OTA from a non-OS firmware to the RTOS firmware. That would imply if the non-OS SDK is abandon, there are possibly 10 of millions of devices that are no longer supportable.

I would rather NOT have an OS. Though maybe not in this case, but "OS" implies to me too much over head. Would much rather have a fully open, minimalist SDK, so I could even strip the code down further. Having only the RTOS option for the esp32 is less confusing.

FayeY commented 5 years ago

@daviddpd

We understand the issue that you have and that it may represent a significant number of ESP8266 users. To help with this, we are working on the OTA from a non-OS firmware to the RTOS firmware now.

But in any case, users will need to make modifications to the application, because the APIs are not compatible, because of underlying changes in the architecture.

We will maintain the old ESP8266 nonOS for a certain (but long) period, but we think that it’s not the best possible solution and we encourages users to move their new project to our new RTOS SDK.

The new RTOS SDK will be compatible with all ESP chips including the new ones that follow. From the marketing folks’ point of view, this will make upgrading from ESP8266 to any other ESP chips. From the engineering point of view, we can concentrate our manpower on the RTOS SDK and ESP-IDF. (ESP8266 will support ESP-IDF.)

Thanks.

ASL07 commented 5 years ago

Hi,

I am seeing this exact same issue in both 2.0.1 and latest SDK

Steps to reproduce:

Then, even after power cycling the device, this behavior is still happening. Erasing the flash and reprogramming the device s the only way to make it connect again.

Is there any workaround for this issue? How can I migrate to the RTOS SDK, if it is fixed there?

daviddpd commented 5 years ago

@ASL07 - I believe this works fine in RTOS.

However, I've stepped away from all my ESP development at the moment.

I believe the problem is related to the WiFi tuning values, and the bootloader (I was using the at bootloader from raburton ) ... there are some little 128 byte files, burning into the config area, that seemly makes all the difference.

I can't be certain, as also a bunch of the older toolchain broke on macOS, and I needed to get a day job, so I can't even test right now - I hope to get back into these in the new year.

ASL07 commented 5 years ago

Hi,

I am seeing this exact same issue in both 2.0.1 and latest SDK

Steps to reproduce:

  • Start the device (ESP8266) and try to connect to an ssid that is not existent WiFi.begin(ssid, pass, channel, bssid, true);
  • If the ssid is not present, the logs show no ssid found, trying to reconnect: no vodafoneBA2311 found, reconnect after 1s reconnect
  • Now, turn on the router, so the ssid is published, the device logs show del if0 usl mode : null mode : sta(18:fe:34:99:50:52) add if0 scandone scandone state: 0 -> 2 (b0) state: 2 -> 3 (0) state: 3 -> 5 (10) add 0 aid 1 cnt state: 5 -> 2 (fc0) rm 0 reconnect

Then, even after power cycling the device, this behavior is still happening. Erasing the flash and reprogramming the device s the only way to make it connect again.

Is there any workaround for this issue? How can I migrate to the RTOS SDK, if it is fixed there?

This was actually my fault. My Flash memory was getting corrupted/overwritten.

Once I fixed this the wifi connection is working fine