espressif / ESP8266_NONOS_SDK

ESP8266 nonOS SDK
Other
926 stars 536 forks source link

[TW#20761] Regression in WiFi connection stability between 2.2.0 and 2.2.1 #119

Open rojer opened 6 years ago

rojer commented 6 years ago

Sine updating to 2.2.1 i'm seeing spurious beacon timeouts and disconnections which do not happen on 2.2.0. Right now i have two devices on my desk next to each other, with firmware that is the same in every way, except one is built with non-os SDK 2.2.0 (version string: 2.2.0(f28eaf2)) and the other with 2.2.1 (version string: 2.2.1(05c8012)). On the first one, WiFi connection is stable:

$ grep -E 'SDK|WiFi|bcn' /tmp/console-2.2.0.log
[Apr 26 14:07:07.636] esp_mgos_init2       SDK 2.2.0(f28eaf2); flash: 4M
[Apr 26 14:07:08.646] mgos_wifi_get_next_s WiFi STA: Using config 0
[Apr 26 14:07:08.650] mgos_wifi_setup      WiFi mode: STA
[Apr 26 14:07:08.653] mgos_wifi_set_mode   WiFi mode: STA
[Apr 26 14:07:08.665] mgos_wifi_setup_sta  WiFi STA: Connecting to Cesanta
[Apr 26 14:07:08.710] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:07:12.461] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:07:13.617] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.35, GW 192.168.1.1, DNS 192.168.1.1

The other drops the connection often:

$ grep -E 'SDK|WiFi|bcn' /tmp/console-2.2.1.log
[Apr 26 14:07:08.182] esp_mgos_init2       SDK 2.2.1(05c8012); flash: 16M
[Apr 26 14:07:09.159] bcn 0
[Apr 26 14:07:09.159] mgos_wifi_get_next_s WiFi STA: Using config 0
[Apr 26 14:07:09.163] mgos_wifi_setup      WiFi mode: STA
[Apr 26 14:07:09.166] mgos_wifi_set_mode   WiFi mode: STA
[Apr 26 14:07:09.178] mgos_wifi_setup_sta  WiFi STA: Connecting to Cesanta
[Apr 26 14:07:09.222] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:07:12.971] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:07:14.129] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:08:29.116] bcn_timout,ap_probe_send_start
[Apr 26 14:08:31.628] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:08:31.637] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:08:32.040] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:08:33.073] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:08:52.158] bcn_timout,ap_probe_send_start
[Apr 26 14:08:54.670] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:08:54.680] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:08:55.086] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:08:56.147] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:09:12.125] bcn_timout,ap_probe_send_start
[Apr 26 14:09:14.636] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:09:14.647] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:09:15.545] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:09:17.171] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:10:01.173] bcn_timout,ap_probe_send_start
[Apr 26 14:10:03.685] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:10:03.694] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:10:04.091] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:10:05.418] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:12:00.777] bcn_timout,ap_probe_send_start
[Apr 26 14:12:31.833] bcn_timout,ap_probe_send_start
[Apr 26 14:12:34.345] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:12:34.355] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:12:34.752] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:12:38.083] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:13:15.120] bcn_timout,ap_probe_send_start
[Apr 26 14:13:17.631] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:13:17.641] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:13:18.039] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:13:19.081] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:14:14.100] bcn_timout,ap_probe_send_start
[Apr 26 14:14:16.611] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:14:16.621] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:14:17.022] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:14:18.152] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1
[Apr 26 14:15:23.118] bcn_timout,ap_probe_send_start
[Apr 26 14:15:25.629] mgos_net_on_change_c WiFi STA: disconnected
[Apr 26 14:15:25.640] mgos_net_on_change_c WiFi STA: connecting
[Apr 26 14:15:26.038] mgos_net_on_change_c WiFi STA: connected
[Apr 26 14:15:27.373] mgos_net_on_change_c WiFi STA: ready, IP 192.168.1.47, GW 192.168.1.1, DNS 192.168.1.1

i'm attaching both logs here in case you'd like to see more info.

logs.zip

kriegste commented 6 years ago

That can be the case if a function of yours is taking too long before returning. I am not sure of how many ms are allowed, but maybe you are slightly over the limit and 2.2.1 is just more sensitive.

wujiangang commented 6 years ago

@rojer Can you test the commit 3f065fe770dd05e6ee91f3816c9b81cb5a89a1d1 in release/v2.x.x branch? I want to check whether this commit cause this issue.

Thanks

rojer commented 6 years ago

@kriegste i don't think my code is responsible. there's nothing going on in the background except for a timer that just prints a single line on the console.

@wujiangang no, that doesn't seem to be the case. the version string is 2.2.1(f42c330) and i don't see disconnects.

rojer commented 6 years ago

i also tested https://github.com/espressif/ESP8266_NONOS_SDK/commit/86acbd8c5ead615b2729013ca4efb8e3fa0828b1 (2.2.1(cfd48f3)) and it's fine. so it seems that it's either https://github.com/espressif/ESP8266_NONOS_SDK/commit/e0fbec7def2a5a3a178ee8d35947933f91c4f7d8 or https://github.com/espressif/ESP8266_NONOS_SDK/commit/6f57c0511df394cf2a42bb690f7de6e5240e0b67 though description says it's an AP-only change, go figure.

rojer commented 6 years ago

@wujiangang yes, it's definitely https://github.com/espressif/ESP8266_NONOS_SDK/commit/6f57c0511df394cf2a42bb690f7de6e5240e0b67

nemonote01 commented 6 years ago

Hi @rojer,

I have tested and have not reproduced this problem. Could you provide me with your source code? The attachment is my test program and UART log. sta_test.txt

#include "osapi.h"
#include "ip_addr.h"
#include "espconn.h"
#include "mem.h"
#include "user_interface.h"

LOCAL os_timer_t test_timer;
/******************************************************************************
 * FunctionName : user_check_ip
 * Description  : check whether get ip addr or not
 * Parameters   : none
 * Returns      : none
*******************************************************************************/
void ICACHE_FLASH_ATTR
user_check_ip(void)
{
    struct ip_info ipconfig;

   //disarm timer first
    os_timer_disarm(&test_timer);

   //get ip info of ESP8266 station
    wifi_get_ip_info(STATION_IF, &ipconfig);

    if (wifi_station_get_connect_status() == STATION_GOT_IP && ipconfig.ip.addr != 0) 
    {
      os_printf("got ip:%d,%d,%d,%d\r\n",(ipconfig.ip.addr>>0)&0xFF,(ipconfig.ip.addr>>8)&0xFF,(ipconfig.ip.addr>>16)&0xFF,(ipconfig.ip.addr>>24)&0xFF);
    } 
    else 
    {

        if ((wifi_station_get_connect_status() == STATION_WRONG_PASSWORD ||
                wifi_station_get_connect_status() == STATION_NO_AP_FOUND ||
                wifi_station_get_connect_status() == STATION_CONNECT_FAIL)) 
        {
         os_printf("connect fail !!! \r\n");
        } 
    }
    //re-arm timer to check ip
    os_timer_setfn(&test_timer, (os_timer_func_t *)user_check_ip, NULL);
    os_timer_arm(&test_timer, 1000, 0);
}

/******************************************************************************
 * FunctionName : user_rf_cal_sector_set
 * Description  : SDK just reversed 4 sectors, used for rf init data and paramters.
 *                We add this function to force users to set rf cal sector, since
 *                we don't know which sector is free in user's application.
 *                sector map for last several sectors : ABCCC
 *                A : rf cal
 *                B : rf init data
 *                C : sdk parameters
 * Parameters   : none
 * Returns      : rf cal sector
*******************************************************************************/
uint32 ICACHE_FLASH_ATTR
user_rf_cal_sector_set(void)
{
    enum flash_size_map size_map = system_get_flash_size_map();
    uint32 rf_cal_sec = 0;
    switch (size_map) {
        case FLASH_SIZE_4M_MAP_256_256:
            rf_cal_sec = 128 - 5;
            break;

        case FLASH_SIZE_8M_MAP_512_512:
            rf_cal_sec = 256 - 5;
            break;

        case FLASH_SIZE_16M_MAP_512_512:
        case FLASH_SIZE_16M_MAP_1024_1024:
            rf_cal_sec = 512 - 5;
            break;

        case FLASH_SIZE_32M_MAP_512_512:
        case FLASH_SIZE_32M_MAP_1024_1024:
            rf_cal_sec = 1024 - 5;
            break;

        case FLASH_SIZE_64M_MAP_1024_1024:
            rf_cal_sec = 2048 - 5;
            break;
        case FLASH_SIZE_128M_MAP_1024_1024:
            rf_cal_sec = 4096 - 5;
            break;
        default:
            rf_cal_sec = 0;
            break;
    }

    return rf_cal_sec;
}

void ICACHE_FLASH_ATTR
user_rf_pre_init(void)
{
}

bool ICACHE_FLASH_ATTR check_memleak_debug_enable (void)
{
return MEMLEAK_DEBUG_ENABLE;
}

/******************************************************************************
 * FunctionName : user_init
 * Description  : entry of user application, init user function here
 * Parameters   : none
 * Returns      : none
*******************************************************************************/
void user_init(void)
{
   // Wifi configuration 
   struct station_config stationConf; 
   os_printf("SDK version is:%s\n", system_get_sdk_version());

   os_memset(stationConf.ssid, 0, 32);
   os_memset(stationConf.password, 0, 64);

   //Set settings 
   os_memcpy(&stationConf.ssid, "TEST001", os_strlen("TEST001")); 
   os_memcpy(&stationConf.password, "1234567890", os_strlen("1234567890")); 
   wifi_station_set_config(&stationConf); 
   //Set station mode 
   wifi_set_opmode(STATION_MODE); 

   //set a timer to check whether got ip from router succeed or not.
   os_timer_disarm(&test_timer);
   os_timer_setfn(&test_timer, (os_timer_func_t *)user_check_ip, NULL);
   os_timer_arm(&test_timer, 1000, 0);
}
FayeY commented 6 years ago

@rojer, Is there any update? Could you share some test code, so we can reproduce the issue and debug it? Thanks in advance.

rojer commented 6 years ago

@FayeY i'm sorry, haven't had time to do additional testing. i work with mongoose os, so i need to re-tool to test against plain SDK. will do soon.

FayeY commented 6 years ago

@rojer , thanks in advance, waiting for your new update.

rojer commented 6 years ago

@FayeY we had to completely turn off modem sleep (see commit above) because of station disconnection reports by our customers (and also observed by us). there's something severely wrong with it and nearly impossible to debug because code is closed. all reports just bounce with "does not reproduce". i do not have time to bang my head against the wall of closed code to create clean repros for you. clearly you guys need better testing. or just open the code already and let people drill down and find and fix bugs.