ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.67k stars 2.98k forks source link

UBLOX_EVK_ODIN_W2: network-wifi test crashes #10815

Closed kimlep01 closed 4 years ago

kimlep01 commented 5 years ago

Description

UBLOX_EVK_ODIN_W2 target crashes in our nightly CI when running network-wifi test. We have not seen this earlier as the test was running inside of a rf shielded box. Now the tests are running in a very noisy environment which most probably causes these crashes when the Odin tries to connect to the network. I can reproduce the crash from line 1826 when I re-run the test inside the test farm. Does not reproduce on my desk (not enough noise I guess).

More common crash:

[1560302096.85][CONN][RXD] >>> Running case #9: 'WIFI-CONNECT-PARAMS-VALID-UNSECURE'...
[1560302096.98][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-PARAMS-VALID-UNSECURE}}, queued...
[1560302134.33][CONN][RXD] 
[1560302134.33][CONN][RXD] 
[1560302134.33][CONN][RXD] ++ MbedOS Error Info ++
[1560302134.43][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1560302134.43][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1560302134.53][CONN][RXD] Location: 0x0
[1560302134.63][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1826
[1560302134.63][CONN][RXD] Error Value: 0x0
[1560302134.83][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20010E08 Entry: 0x80E44C1 StackSize: 0x1000 StackMem: 0x20010E60 SP: 0x20011D08 
[1560302134.94][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&osver=999999&core=0x410FC241&comp=3&ver=8032001&tgt=UBLOX_EVK_...
[1560302134.94][CONN][RXD] 
[1560302134.94][CONN][RXD] -- MbedOS Error Info --
[1560302134.94][CONN][RXD] 
[1560302135.03][CONN][RXD] = System will be rebooted due to a fatal error =

This crash has happened only in CI runs:

[1560301680.70][CONN][RXD] >>> Running case #9: 'WIFI-CONNECT-PARAMS-VALID-UNSECURE'...
[1560301680.70][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-PARAMS-VALID-UNSECURE}}, queued...
[1560301708.56][CONN][RXD] 
[1560301708.56][CONN][RXD] 
[1560301708.56][CONN][RXD] ++ MbedOS Error Info ++
[1560301708.66][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1560301708.66][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1560301708.66][CONN][RXD] Location: 0x801A1A3
[1560301708.86][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1596
[1560301708.86][CONN][RXD] Error Value: 0x0
[1560301708.96][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20010890 Entry: 0x8010799 StackSize: 0x1000 StackMem: 0x200108E8 SP: 0x20011490 
[1560301709.16][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x8003010F&osver=999999&core=0x410FC241&comp=1&ver=6110004&tgt=UBLOX_EVK_...
[1560301709.16][CONN][RXD] 
[1560301709.16][CONN][RXD] -- MbedOS Error Info --
[1560301709.16][CONN][RXD] 
[1560301709.16][CONN][RXD] = System will be rebooted due to a fatal error =

Edit - found one more crash:

[1560935800.92][CONN][RXD] >>> Running case #8: 'WIFI-GET-RSSI'...
[1560935801.02][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-GET-RSSI}}, queued...
[1560935823.68][CONN][RXD] 
[1560935823.68][CONN][RXD] 
[1560935823.76][CONN][RXD] ++ MbedOS Error Info ++
[1560935823.76][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1560935823.86][CONN][RXD] Error Message: Assertion failed: msg->type == ODIN_WIFI_MSG_USER_DISCONNECT
[1560935823.86][CONN][RXD] Location: 0x801A1A3
[1560935824.06][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+463
[1560935824.06][CONN][RXD] Error Value: 0x0
[1560935824.16][CONN][RXD] Current Thread: main  Id: 0x2000BA60 Entry: 0x801A791 StackSize: 0x1000 StackMem: 0x200071D8 SP: 0x20008088 
[1560935824.26][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&osver=999999&core=0x410FC241&comp=1&ver=6110004&tgt=UBLOX_EVK_...
[1560935824.26][CONN][RXD] 
[1560935824.36][CONN][RXD] -- MbedOS Error Info --
[1560935824.36][CONN][RXD] 
[1560935824.36][CONN][RXD] = System will be rebooted due to a fatal error =

mbed-os version (master):

commit fc645bc3ed357743d4acf6714564b7f6291c6016 (HEAD -> master, origin/master, origin/HEAD)
Merge: ed69fc1 6df6dec
Author: Anna Bridge <Anna.Bridge@arm.com>
Date:   Tue Jun 11 15:07:43 2019 +0100

    Merge pull request #10805 from kjbracey-arm/armc5_blinky

    Fix ARMC5 compilation

The crashes don't seem to be compiler or test case specific. I have seen these crashes with several different wifi tests cases from this same test suite.

How to reproduce: Use a VERY noisy test environment for Wifi Edit tools/test_configs/WiFiInterface.json and give proper values for: wifi-secure-ssid wifi-unsecure-ssid wifi-password wifi-ch-secure wifi-ch-unsecure

mbed test --compile -t GCC_ARM -m UBLOX_EVK_ODIN_W2 -n tests-network-wifi --app-config=tools/test_configs/WiFiInterface.json -DMBED_HEAP_STATS_ENABLED=1 -DMBED_STACK_STATS_ENABLED=1 -DMBED_TRAP_ERRORS_ENABLED=1 -DMBED_ALL_STATS_ENABLED -DSKIP_TIME_DRIFT_TESTS=1

mbedgt -m UBLOX_EVK_ODIN_W2 -n tests-network-wifi -V (crashes 2/3 times in our environment)

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug
kimlep01 commented 5 years ago

cc @ARMmbed/team-ublox

ciarmcom commented 5 years ago

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1279

aqib-ublox commented 5 years ago

@kimlep01 wifi logs.txt

I am not able to reproduce above scnerio even removed two antennas of AP out of three no crash seen

aqib-ublox commented 5 years ago

@kimlep01 have you tried with ARM and IAR also?

I coundn't see it

kimlep01 commented 5 years ago

Yes we run those tests with GCC, ARM and IAR binaries. I have seen crashes with all compilers. I was hoping that those line numbers would help tracking down why those assert failures happen. We have similar issues open with mtb ublox odin w2 also. My guess is that something goes wrong when the board tries to connect to a heavily utilised access point.

kimlep01 commented 5 years ago

Hi @ARMmbed/team-ublox

This issue is still visible in our nightly tests. I have been able to reproduce this issue manually on my desk also:

I ran the test a few times and got these three different crashes:

[1566820165.73][CONN][RXD] >>> Running 1 test cases...
[1566820165.79][CONN][RXD]
[1566820165.79][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566820165.85][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1566820165.90][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566820228.01][CONN][RXD]
[1566820228.01][CONN][RXD]
[1566820228.03][CONN][RXD] ++ MbedOS Error Info ++
[1566820228.08][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1566820228.15][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1566820228.16][CONN][RXD] Location: 0x0
[1566820228.31][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1566820228.32][CONN][RXD] Error Value: 0x0
[1566820228.47][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008818 Entry: 0x80E3869 StackSize: 0x1000 StackMem: 0x20011990 SP: 0x200128F0
[1566820228.55][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1566820228.58][CONN][RXD] -- MbedOS Error Info --
[1566820228.58][CONN][RXD]
[1566820228.63][CONN][RXD] = System will be rebooted due to a fatal error =

----

[1566821982.25][CONN][RXD] >>> Running 1 test cases...
[1566821982.30][CONN][RXD]
[1566821982.30][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566821982.36][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1566821982.42][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566821994.93][CONN][RXD]
[1566821994.93][CONN][RXD]
[1566821994.96][CONN][RXD] ++ MbedOS Error Info ++
[1566821995.00][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1566821995.08][CONN][RXD] Error Message: Assertion failed: msg->type == ODIN_WIFI_MSG_USER_DISCONNECT
[1566821995.10][CONN][RXD] Location: 0x0
[1566821995.25][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+463
[1566821995.27][CONN][RXD] Error Value: 0x0
[1566821995.37][CONN][RXD] Current Thread: main  Id: 0x2000BA5C Entry: 0x80E91C9 StackSize: 0x1000 StackMem: 0x200052B0 SP: 0x200061B0
[1566821995.46][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1566821995.49][CONN][RXD] -- MbedOS Error Info --
[1566821995.49][CONN][RXD]
[1566821995.55][CONN][RXD] = System will be rebooted due to a fatal error =

----

[1566822170.93][CONN][RXD] >>> Running 1 test cases...
[1566822170.99][CONN][RXD]
[1566822170.99][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566822171.05][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1566822171.10][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1566822187.21][CONN][RXD]
[1566822187.21][CONN][RXD]
[1566822187.24][CONN][RXD] ++ MbedOS Error Info ++
[1566822187.28][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1566822187.34][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1566822187.36][CONN][RXD] Location: 0x0
[1566822187.51][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+782
[1566822187.53][CONN][RXD] Error Value: 0x0
[1566822187.66][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008818 Entry: 0x80E3869 StackSize: 0x1000 StackMem: 0x20011990 SP: 0x20012908
[1566822187.75][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1566822187.78][CONN][RXD] -- MbedOS Error Info --
[1566822187.78][CONN][RXD]
[1566822187.83][CONN][RXD] = System will be rebooted due to a fatal error =

mbed-os commit:

commit 02d1873c06255220faa0c1cabfc3b14696181c89 (HEAD -> master, origin/master, origin/HEAD)
Merge: b005bf2 1ab75d3
Author: Martin Kojtal <martin.kojtal@arm.com>
Date:   Mon Aug 26 11:08:47 2019 +0200
    Merge pull request #11266 from u-blox/feature_adc_temp_conf
    ADC internal temperature support and EMAC header fix

Contents of my mbed_app.json used (just password and unsecure AP name changed from our CI mbed_app.json):

{
  "config": {
    "wifi-secure-ssid": {
      "help": "WiFi SSID for WPA2 secured network",
      "value": "\"4tousnet\""
    },
    "wifi-unsecure-ssid": {
      "help": "WiFi SSID for unsecure netwrok",
      "value": "\"qwerty123\""
    },
    "wifi-password": {
      "help": "WiFi Password",
      "value": "\"secret\""
    },
    "wifi-secure-protocol": {
      "help": "WiFi security protocol, valid values are WEP, WPA, WPA2, WPA/WPA2",
      "value": "\"WPA2\""
    },
    "wifi-ch-secure": {
      "help": "Channel number of secure SSID",
      "value": 10
    },
    "wifi-ch-unsecure": {
      "help": "Channel number of unsecure SSID",
      "value": 2
    },
    "ap-mac-secure": {
      "help": "BSSID of secure AP in form of AA:BB:CC:DD:EE:FF",
      "value": "\"80:2a:a8:8a:e9:13\""
    },
    "ap-mac-unsecure": {
      "help": "BSSID of unsecure AP in form of \"AA:BB:CC:DD:EE:FF\"",
      "value": "\"58:8b:f3:99:c2:08\""
    },
    "max-scan-size": {
      "help": "How many networks may appear in Wifi scan result",
      "value": 30
    },
    "echo-server-addr": {
      "help": "IP address of echo server",
      "value": "\"echo.mbedcloudtesting.com\""
    },
    "echo-server-port": {
      "help": "Port of echo server",
      "value": "7"
    },
    "echo-server-discard-port": {
      "help": "Discard port of echo server",
      "value": "9"
    },
    "echo-server-port-tls": {
      "help": "Port of echo server for TLS",
      "value": "2007"
    },
    "echo-server-discard-port-tls": {
      "help": "Discard port of echo server for TLS",
      "value": "2009"
    }
  },
  "target_overrides": {
    "*": {
      "lwip.ipv4-enabled": true,
      "lwip.ipv6-enabled": false,
      "nsapi.default-wifi-ssid": "MBED_CONF_APP_WIFI_SECURE_SSID",
      "nsapi.default-wifi-password": "MBED_CONF_APP_WIFI_PASSWORD",
      "nsapi.default-wifi-security": "WPA_WPA2",
      "target.components_add": ["FPGA_CI_TEST_SHIELD"]
    },
    "K66F": {
      "events.shared-stacksize": 2048,
      "esp8266.provide-default": true,
      "target.network-default-interface-type": "WIFI",
      "esp8266.tx": "PTC4",
      "esp8266.rx": "PTC3",
      "esp8266.rts": "PTC1",
      "esp8266.cts": "PTC2",
      "esp8266.rst": "D8"
    },
    "NUCLEO_F401RE": {
      "idw0xx1.provide-default": true,
      "idw0xx1.tx": "D8",
      "idw0xx1.rx": "D2",
      "drivers.uart-serial-txbuf-size": 512,
      "drivers.uart-serial-rxbuf-size": 512
    },
    "DISCO_L475VG_IOT01A": {
      "ism43362.provide-default": true,
      "target.network-default-interface-type": "WIFI"
    },
    "SDT64B": {
      "target.components_add": ["SD"],
      "sd.SPI_MOSI": "PTB16",
      "sd.SPI_MISO": "PTB17",
      "sd.SPI_CLK": "PTB11",
      "sd.SPI_CS": "PTB10"
    },
    "UBLOX_EVK_ODIN_W2": {
      "target.network-default-interface-type": "WIFI"
    },
    "GR_LYCHEE": {
      "target.network-default-interface-type": "WIFI"
    }
  }
}
aqib-ublox commented 5 years ago

@kimlep01 Ok we will generate fix in next PR

michalpasztamobica commented 5 years ago

The error codes identical as in https://github.com/ARMmbed/mbed-os/issues/9621 . The code line has changed. Perhaps that will help?

aqib-ublox commented 5 years ago

@kimlep01 https://github.com/ARMmbed/mbed-os/pull/11370 generate please verify

kimlep01 commented 5 years ago

Still managed to reproduce the crash. Same mbed_app.json used as above, edited main.cpp to run CONNECT-DISCONNECT-REPEAT 20 times in a row, compiled with IAR 8.32.1 and ran on my desk with my phone as a Wi-Fi hotspot. Crash happened on the third time.

Logs:

[1567072488.16][HTST][INF] CALLBACKs updated
[1567072488.16][HTST][INF] host test detected: default_auto
[1567072488.19][CONN][RXD] >>> Running 20 test cases...
[1567072488.25][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.30][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.36][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.41][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.46][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.52][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.58][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.63][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.68][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.73][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.79][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.85][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.90][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072488.95][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.00][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.06][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.12][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.17][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.22][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.28][CONN][RXD]
[1567072489.28][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072489.33][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567072489.40][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072509.63][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1567072509.68][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1567072509.68][CONN][RXD]
[1567072509.75][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567072509.81][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072525.13][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1567072525.19][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1567072525.19][CONN][RXD]
[1567072525.25][CONN][RXD] >>> Running case #3: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567072525.30][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567072594.28][CONN][RXD]
[1567072594.28][CONN][RXD]
[1567072594.32][CONN][RXD] ++ MbedOS Error Info ++
[1567072594.36][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1567072594.42][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1567072594.44][CONN][RXD] Location: 0x0
[1567072594.58][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1567072594.59][CONN][RXD] Error Value: 0x0
[1567072594.74][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008A5C Entry: 0x80DFAD5 StackSize: 0x1000 StackMem: 0x20012300 SP: 0x20013260
[1567072594.82][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1567072594.85][CONN][RXD] -- MbedOS Error Info --
[1567072594.85][CONN][RXD]
[1567072594.90][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/mbed-os$ git log -1
commit 7f36fbc86ee54ddcd3e37e7a55681b89409d7533 (HEAD -> ublox-pr-29-08)
Author: aqib-ublox <aqib.nasim@u-blox.com>
Date:   Thu Aug 29 11:33:29 2019 +0500

    Driver Updates + ARMC6 driver support + wifi fixes
kimlep01@e109827-lin:~/dev/git_repos/mbed-os$
aqib-ublox commented 5 years ago

@kimlep01 yes let me reproduce it on my desk and see if can give some fix. just on side note if you just comment this assertion and one more assertion after cbWLAN_disconnect() in same file to see any more crash occurring.It will helps us, as i can see disconnection is made but wrong status is returning. Because its happening in congested environment where multiple re-connection attempts are made or station not properly disconnected.

kimlep01 commented 5 years ago

Retested the PR with the following changes:

--- i/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp
+++ w/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp
@@ -758,7 +758,7 @@ OdinWiFiInterface::OdinWifiState OdinWiFiInterface::entry_connect_fail_wait_disc
     error_code = cbWLAN_disconnect(handle);
     cbMAIN_driverUnlock();

-    MBED_ASSERT(error_code == cbSTATUS_OK);
+    //MBED_ASSERT(error_code == cbSTATUS_OK);

     return S_STA_CONNECTION_FAIL_WAIT_DISCONNECT;
 }
@@ -779,7 +779,7 @@ OdinWiFiInterface::OdinWifiState OdinWiFiInterface::entry_wait_disconnect()
     error_code = cbWLAN_disconnect(handle);
     cbMAIN_driverUnlock();

-    MBED_ASSERT(error_code == cbSTATUS_OK);
+    //MBED_ASSERT(error_code == cbSTATUS_OK);

     return S_STA_WAIT_DISCONNECT;
 }

I could not see any more crashes. Tests would start failing pretty fast sometimes but no more crashing:

[1567154258.51][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567154258.51][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567154258.60][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567154258.60][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1567154277.97][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1567154277.97][CONN][RXD]
[1567154277.97][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1567154277.97][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567154278.09][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567154355.53][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154355.53][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154355.53][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154355.71][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1567154355.71][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;0;4}}, queued...
[1567154355.77][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 0 passed, 4 failed with reason 'Test Cases Failed'
[1567154355.77][CONN][RXD]
[1567154355.77][CONN][RXD] >>> Running case #3: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567154355.77][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154355.90][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154355.90][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154355.90][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.15][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154356.15][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.15][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154356.15][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.15][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3002
[1567154356.15][CONN][RXD] :38::FAIL: Expected 0 Was -3002
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567154356.19][CONN][RXD] :40::FAIL: Expected 0 Was -3004
...all rest of the tests till 20 fail fast to these errors
aqib-ublox commented 5 years ago

@kimlep01 tests starts failing because of weak signal strength returning no connection status. Thanks for feedback ,let me find a fix and generate PR soon

aqib-ublox commented 5 years ago

Hi @kimlep01

can u add below hack

`void OdinWiFiInterface::handle_wlan_status_connected(wlan_status_connected_s *wlan_connect) { nsapi_error_t error_code;

MBED_ASSERT(wlan_connect != 0);

switch(_state_sta) {
    case S_STA_CONNECTED:
    case S_STA_WAIT_CONNECT:
        _timer.stop();

        if(_debug) {
            printf("MBED_IPSTACK_ \r\n");
        }
        if(_state_sta != S_STA_CONNECTED){
            error_code = _interface->bringup(_dhcp,
                        _ip_address[0] ? _ip_address : 0,
                        _netmask[0] ? _netmask : 0,
                        _gateway[0] ? _gateway : 0,
                        DEFAULT_STACK);
       }

`

in OdinWifiInterface.cpp and test once again it may solve problem, please give feedback i would generate PR after that

kimlep01 commented 5 years ago

Retested with latest mbed-os master with this change:

+++ w/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp
@@ -1305,13 +1305,13 @@ void OdinWiFiInterface::handle_wlan_status_connected(wlan_status_connected_s *wl
             if(_debug) {
                 printf("MBED_IPSTACK_ \r\n");
             }
-
-            error_code = _interface->bringup(_dhcp,
+            if(_state_sta != S_STA_CONNECTED){
+                error_code = _interface->bringup(_dhcp,
                         _ip_address[0] ? _ip_address : 0,
                         _netmask[0] ? _netmask : 0,
                         _gateway[0] ? _gateway : 0,
                         DEFAULT_STACK);
-
+            }

I first made sure I could reproduce the issue still with clean master. I couldn't do that on my desk for some reason so I ran the test in the lab where our nightly tests run also. Test crashed on clean master and with the above change also. The line number was the same in both cases.

Error message:

[1567762678.59][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1567762678.73][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;0;20}}, queued...
[1567762678.87][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 0 passed, 20 failed with reason 'Test Cases Failed'
[1567762678.87][CONN][RXD]
[1567762678.87][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1567762678.89][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762704.00][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762704.18][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762729.13][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762729.13][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762754.25][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762754.25][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762779.41][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762779.41][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1567762804.26][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1567762804.26][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1567762834.14][CONN][RXD]
[1567762834.28][CONN][RXD]
[1567762834.28][CONN][RXD] ++ MbedOS Error Info ++
[1567762834.28][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1567762834.38][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1567762834.38][CONN][RXD] Location: 0x0
[1567762834.49][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1834
[1567762834.54][CONN][RXD] Error Value: 0x0
[1567762834.67][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x200112A8 Entry: 0x80DFAD5 StackSize: 0x1000 StackMem: 0x200112F8 SP: 0x20012190
[1567762834.83][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1567762834.83][CONN][RXD] -- MbedOS Error Info --
[1567762834.83][CONN][RXD]
[1567762834.83][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
aqib-ublox commented 5 years ago

Hi @kimlep01 i am sharing private branch for testing please verify above issue: https://github.com/u-blox/mbed-os/tree/ublox_odin_driver_os_5_v3.7.1_rc2

kimlep01 commented 5 years ago

Can be reproduced with the above mentioned branch:

[1568712135.32][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1568712135.32][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712161.18][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712161.18][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712185.93][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712185.93][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712210.70][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712210.70][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1568712235.32][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1568712235.32][CONN][RXD] :40::FAIL: Expected 0 Was -3004
[1568712276.31][CONN][RXD]
[1568712276.31][CONN][RXD]
[1568712276.53][CONN][RXD] ++ MbedOS Error Info ++
[1568712276.53][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1568712276.53][CONN][RXD] Error Message: Assertion failed: msg != NULL
[1568712276.53][CONN][RXD] Location: 0x0
[1568712276.70][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+1834
[1568712276.70][CONN][RXD] Error Value: 0x0
[1568712276.90][CONN][RXD] Current Thread: application_unnamed_thread Id: 0x200112A8 Entry: 0x80DFAD5 StackSize: 0x1000 StackMem: 0x200112F8 SP: 0x20012190
[1568712277.13][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1568712277.13][CONN][RXD] -- MbedOS Error Info --
[1568712277.13][CONN][RXD]
[1568712277.13][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit 9fbe9716b8ac26df73d990db70c252da446f8355 (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc2, origin/ublox_odin_driver_os_5_v3.7.1_rc2)
Author: aqin <aqib.nasim@u-blox.com>
Date:   Thu Sep 12 09:53:38 2019 +0500

    Update driver to avoid locking issue
aqib-ublox commented 5 years ago

@kimlep01 please see logs below, i am not getting any crash tested it more than 10 times, can u please see if J21 is short and secondly try with replacing other odin_w2 board. log_try_1.txt

kimlep01 commented 5 years ago

We have 4 boards running in the lab. I tested all of them: Board 1: asserted at line 1834 Board 2: asserted at line 761 Board 3: asserted at line 1834 Board 4: asserted at line 1834

The boards have jumper J21 shorted. I made a request to the infrastructure team to replace them just to be sure which will take a few days as I don't have access there.

kimlep01 commented 5 years ago

Changing the J21 jumpers did not help and I still could reproduce this both in CI and locally on my desk using the private branch.

I ran the same test (20x connect-disconnect-repeat) with pyOCD debugger attached but all I could see is that on line 758 of OdinWiFiInterface.cpp cbWLAN_disconnect(handle) returned cbSTATUS_ERROR. No visibility beyond that.

[1569409542.72][CONN][RXD] 
[1569409542.74][CONN][RXD] ++ MbedOS Error Info ++
[1569409542.79][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1569409542.85][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1569409542.87][CONN][RXD] Location: 0x80272E7
[1569409543.03][CONN][RXD] File: ./mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+762
[1569409543.05][CONN][RXD] Error Value: 0x0
[1569409543.18][CONN][RXD] Current Thread: application_unnamed_thread Id: 0x200064F4 Entry: 0x802B1E9 StackSize: 0x1000 StackMem: 0x20012D40 SP: 0x20013C68 
[1569409543.27][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1569409543.29][CONN][RXD] -- MbedOS Error Info --
[1569409543.30][CONN][RXD] 
[1569409543.34][CONN][RXD] = System will be rebooted due to a fatal error =

This might feel tricky to reproduce sometimes but here is what I suggest:

The other way would be to provide a debug version of the driver for us with enough traces so we could help more.

(I will be out of office until 7.10. Team ARMmbed/mbed-os-wan can assist until then if required)

hmxmghl commented 5 years ago

Hi @kimlep01

Thank you for writing the detailed procedure to reproduce the issue. We were in fact able to reproduce it and have hopefully fixed it.

Hereby sharing our private branch with you: ublox_odin_driver_os_5_v3.7.1_rc3

Can you kindly test and let us know about your test results?

kimlep01 commented 5 years ago

Hello again,

I retested with the new branch and unfortunately I could still reproduce the issue. I tested that I could reproduce the issue with all compilers and using two different boards. I don't get any test case errors prior to assert - it passes just fine until mbed assert happens.

IAR8:

[1570702801.50][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1570702801.50][CONN][RXD]
[1570702801.56][CONN][RXD] >>> Running case #8: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570702801.62][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570702805.34][CONN][RXD]
[1570702805.34][CONN][RXD]
[1570702805.36][CONN][RXD] ++ MbedOS Error Info ++
[1570702805.41][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1570702805.47][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1570702805.48][CONN][RXD] Location: 0x0
[1570702805.64][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+782
[1570702805.65][CONN][RXD] Error Value: 0x0
[1570702805.79][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20008A60 Entry: 0x80DFADD StackSize: 0x1000 StackMem: 0x20012310 SP: 0x20013280
[1570702805.87][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1570702805.91][CONN][RXD] -- MbedOS Error Info --
[1570702805.91][CONN][RXD]
[1570702805.96][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit cecde0ec81394d5bf8a058b23175f521a9642c8f (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc3, origin/ublox_odin_driver_os_5_v3.7.1_rc3)
Author: Hamza Rizwan <hamza.rizwan66@gmail.com>
Date:   Thu Oct 3 15:15:55 2019 +0500

    Handle Syncronization between EMAC & WLAN Driver + Message Pool Overflow fix

ARMC6:

[1570704403.09][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1570704403.09][CONN][RXD]
[1570704403.15][CONN][RXD] >>> Running case #20: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570704403.20][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570704466.18][CONN][RXD]
[1570704466.18][CONN][RXD]
[1570704466.20][CONN][RXD] ++ MbedOS Error Info ++
[1570704466.26][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1570704466.32][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1570704466.34][CONN][RXD] Location: 0x80363F1
[1570704466.48][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1570704466.50][CONN][RXD] Error Value: 0x0
[1570704466.64][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x200042B8 Entry: 0x800FE99 StackSize: 0x1000 StackMem: 0x20011AF0 SP: 0x20012A28
[1570704466.73][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1570704466.75][CONN][RXD] -- MbedOS Error Info --
[1570704466.75][CONN][RXD]
[1570704466.80][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit cecde0ec81394d5bf8a058b23175f521a9642c8f (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc3, origin/ublox_odin_driver_os_5_v3.7.1_rc3)
Author: Hamza Rizwan <hamza.rizwan66@gmail.com>
Date:   Thu Oct 3 15:15:55 2019 +0500

    Handle Syncronization between EMAC & WLAN Driver + Message Pool Overflow fix

GCC_ARM8:

[1570708858.78][CONN][RXD] >>> Running case #2: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570708858.83][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570708882.66][CONN][INF] found KV pair in stream: {{__testcase_finish;WIFI-CONNECT-DISCONNECT-REPEAT;1;0}}, queued...
[1570708882.72][CONN][RXD] >>> 'WIFI-CONNECT-DISCONNECT-REPEAT': 1 passed, 0 failed
[1570708882.72][CONN][RXD]
[1570708882.77][CONN][RXD] >>> Running case #3: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1570708882.83][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1570708950.71][CONN][RXD]
[1570708950.71][CONN][RXD]
[1570708950.74][CONN][RXD] ++ MbedOS Error Info ++
[1570708950.79][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1570708950.85][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1570708950.87][CONN][RXD] Location: 0x8020365
[1570708951.02][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1570708951.04][CONN][RXD] Error Value: 0x0
[1570708951.18][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x20006408 Entry: 0x80239E9 StackSize: 0x1000 StackMem: 0x20012A40 SP: 0x20013968
[1570708951.26][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1570708951.29][CONN][RXD] -- MbedOS Error Info --
[1570708951.29][CONN][RXD]
[1570708951.34][CONN][RXD] = System will be rebooted due to a fatal error =
^Cmbedgt: ctrl+c keyboard interrupt!
kimlep01@e109827-lin:~/dev/git_repos/U-BLOX/mbed-os$ git log -1
commit cecde0ec81394d5bf8a058b23175f521a9642c8f (HEAD -> ublox_odin_driver_os_5_v3.7.1_rc3, origin/ublox_odin_driver_os_5_v3.7.1_rc3)
Author: Hamza Rizwan <hamza.rizwan66@gmail.com>
Date:   Thu Oct 3 15:15:55 2019 +0500

    Handle Syncronization between EMAC & WLAN Driver + Message Pool Overflow fix
hmxmghl commented 5 years ago

Hi @kimlep01

That is strange. After the fix, we ran the test overnight and there was no assert.

Can you please tell us how many times you tries running it?

Can you again try by building clean, hard resetting the board, shorting the J21 jumper, hard resetting the access point (if possible)?

kimlep01 commented 5 years ago

I hit the issue on the first or second run. Now it seems to assert reliably on line 761. I did a clean clone and boards are powered off before test runs. I tested locally with two boards (W260 & W262), changed different jumpers to J21, restarted my phone (S10e) before tests which acted as an unsecure AP.

I also tested inside a RF-BOX with the same outcome (couldn't restart AP there). The mbed_app.json used can be found from my fourth comment (just change unsecure AP name).

Here is the modified test case content, I seem to have increased the timeout also:

diff --git i/TESTS/network/wifi/main.cpp w/TESTS/network/wifi/main.cpp
index 73859eb..73d5a9f 100644
--- i/TESTS/network/wifi/main.cpp
+++ w/TESTS/network/wifi/main.cpp
@@ -54,36 +54,55 @@ utest::v1::status_t test_setup(const size_t number_of_cases)
 #ifdef MBED_GREENTEA_TEST_WIFI_TIMEOUT_S
     GREENTEA_SETUP(MBED_GREENTEA_TEST_WIFI_TIMEOUT_S, "default_auto");
 #else
-    GREENTEA_SETUP(360, "default_auto");
+    GREENTEA_SETUP(4200, "default_auto");
 #endif
     return verbose_test_setup_handler(number_of_cases);
 }

 // Test cases
 Case cases[] = {
-    Case("WIFI-CONSTRUCTOR", wifi_constructor),
-    Case("WIFI-CONNECT-NOCREDENTIALS", wifi_connect_nocredentials),
-    Case("WIFI-SET-CREDENTIAL", wifi_set_credential),
-    Case("WIFI-SET-CHANNEL", wifi_set_channel),
-    Case("WIFI-CONNECT-PARAMS-NULL", wifi_connect_params_null),
-    Case("WIFI-SCAN-NULL", wifi_scan_null),
+    //Case("WIFI-CONSTRUCTOR", wifi_constructor),
+    //Case("WIFI-CONNECT-NOCREDENTIALS", wifi_connect_nocredentials),
+    //Case("WIFI-SET-CREDENTIAL", wifi_set_credential),
+    //Case("WIFI-SET-CHANNEL", wifi_set_channel),
+    //Case("WIFI-CONNECT-PARAMS-NULL", wifi_connect_params_null),
+    //Case("WIFI-SCAN-NULL", wifi_scan_null),
 #if defined(MBED_CONF_APP_WIFI_SECURE_SSID) || defined(MBED_CONF_APP_WIFI_UNSECURE_SSID)
-    Case("WIFI-SCAN", wifi_scan),
+    //Case("WIFI-SCAN", wifi_scan),
 #endif
 #if defined(MBED_CONF_APP_WIFI_UNSECURE_SSID)
-    Case("WIFI-GET-RSSI", wifi_get_rssi),
-    Case("WIFI-CONNECT-PARAMS-VALID-UNSECURE", wifi_connect_params_valid_unsecure),
-    Case("WIFI-CONNECT", wifi_connect),
+    //Case("WIFI-GET-RSSI", wifi_get_rssi),
+    //Case("WIFI-CONNECT-PARAMS-VALID-UNSECURE", wifi_connect_params_valid_unsecure),
+    //Case("WIFI-CONNECT", wifi_connect),
     //Most boards are not passing this test, but they should if they support non-blocking API.
     //Case("WIFI-CONNECT-NONBLOCKING", wifi_connect_nonblock),
     Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
+    Case("WIFI-CONNECT-DISCONNECT-REPEAT", wifi_connect_disconnect_repeat),
 #endif
 #if defined(MBED_CONF_APP_WIFI_SECURE_SSID)
-    Case("WIFI-CONNECT-PARAMS-VALID-SECURE", wifi_connect_params_valid_secure),
-    Case("WIFI-CONNECT-PARAMS-CHANNEL", wifi_connect_params_channel),
-    Case("WIFI-CONNECT-PARAMS-CHANNEL-FAIL", wifi_connect_params_channel_fail),
-    Case("WIFI-CONNECT-SECURE", wifi_connect_secure),
-    Case("WIFI-CONNECT-SECURE-FAIL", wifi_connect_secure_fail),
+    //Case("WIFI-CONNECT-PARAMS-VALID-SECURE", wifi_connect_params_valid_secure),
+    //Case("WIFI-CONNECT-PARAMS-CHANNEL", wifi_connect_params_channel),
+    //Case("WIFI-CONNECT-PARAMS-CHANNEL-FAIL", wifi_connect_params_channel_fail),
+    //Case("WIFI-CONNECT-SECURE", wifi_connect_secure),
+    //Case("WIFI-CONNECT-SECURE-FAIL", wifi_connect_secure_fail),
 #endif
 };
hmxmghl commented 5 years ago

Thank you for these insights.

If you hit this on first or second run, that's worrisome. We are barely able to reproduce it anymore. And even if we get the assert, it is after 700-800 connect/disconnect sequences.

We are trying to create the scenario here which causes this assert to occur (like breaking the link in between handshake). So far we have observed that if the Bluetooth of the phone is turned on as well while phone is acting as an AP, it becomes unstable and there are a lot more failed connect attempts (not asserts).

Can you help us identify a low-level scenario in which this can be reproduced? Like sharing the filtered Wireshark logs of packets to and from ODIN on the channel it tries to connect/disconnect.

If you have any other observations that can help us reproduce this will be helpful.

aqib-ublox commented 5 years ago

@kimlep01 most likely issue doesn't seem to belong driver side. Actually after frequent connection it got failed in LWIP layer particularly in bringup function acquiring lock. It inquire us either last bring down was not successfully terminated which is causing this scenario taking place.

if (!LWIP::get_ip_addr(true, &netif)) { if (osSemaphoreAcquire(has_any_addr, DHCP_TIMEOUT * 1000) != osOK) { return NSAPI_ERROR_DHCP_FAILURE; } }

hmxmghl commented 5 years ago

So we tried disabling the DHCP and use the static IP. There have been no asserts even after 2000 connect/disconnect sequence.

You can test with following test application:

#include "mbed.h"
#include "greentea-client/test_env.h"
#include "unity.h"
#include "utest.h"
#include "wifi_tests.h"

using namespace utest::v1;

#if defined(MBED_CONF_APP_WIFI_UNSECURE_SSID)

// Replace the following params as per your AP's configuration
#define WIFI_IP_ADDR                         "172.20.10.6"
#define WIFI_DEFAULT_GATEWAY         "172.20.10.1"
#define WIFI_NETMASK                         "255.255.255.240"

void wifi_connect_disconnect_repeat(void)
{
    WiFiInterface *wifi = get_interface();
    nsapi_error_t error;

    error = wifi->set_credentials(MBED_CONF_APP_WIFI_UNSECURE_SSID, NULL);
    TEST_ASSERT_EQUAL(NSAPI_ERROR_OK, error);

    error = wifi->set_network(WIFI_IP_ADDR, WIFI_NETMASK, WIFI_DEFAULT_GATEWAY);
    TEST_ASSERT_EQUAL(NSAPI_ERROR_OK, error);

    error = wifi->set_dhcp(false);
    TEST_ASSERT_EQUAL(NSAPI_ERROR_OK, error);

    for (int i = 0; i < 2000; i++) {
        error = wifi->connect();
    TEST_ASSERT_EQUAL(NSAPI_ERROR_OK, error);
        error = wifi->disconnect();
    TEST_ASSERT_EQUAL(NSAPI_ERROR_OK, error);
    }
}

#endif // defined(MBED_CONF_APP_WIFI_UNSECURE_SSID)

It seems like issue is in LWIP stack since it is handling DHCP negotiation.

kimlep01 commented 4 years ago

Hi @hamza-ubx @aqib-ublox and sorry about the long silence from my end.

I tested the above provided test case over night for 16 hours without any asserts happening.

Then I tested again the normal test case with 2000 set in the for loop. I could reproduce the issue pretty soon. Assert happens after 1080s:

[1573485020.50][CONN][RXD]
[1573485020.50][CONN][INF] found KV pair in stream: {{__testcase_name;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1573485020.55][CONN][RXD] >>> Running case #1: 'WIFI-CONNECT-DISCONNECT-REPEAT'...
[1573485020.62][CONN][INF] found KV pair in stream: {{__testcase_start;WIFI-CONNECT-DISCONNECT-REPEAT}}, queued...
[1573486099.90][CONN][RXD]
[1573486099.90][CONN][RXD]
[1573486099.94][CONN][RXD] ++ MbedOS Error Info ++
[1573486099.98][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1573486100.04][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1573486100.06][CONN][RXD] Location: 0x80364A1
[1573486100.21][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+761
[1573486100.23][CONN][RXD] Error Value: 0x0
[1573486100.36][CONN][RXD] Current Thread: application_unnamed_thread  Id: 0x200042B8 Entry: 0x800FF49 StackSize: 0x1000 StackMem: 0x20011C40 SP: 0x20012B78
[1573486100.46][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&tgt=UBLOX_EVK_ODIN_W2
[1573486100.48][CONN][RXD] -- MbedOS Error Info --
[1573486100.48][CONN][RXD]
[1573486100.53][CONN][RXD] = System will be rebooted due to a fatal error =

See attachment for Wireshark log. I ended the capture as soon as I saw the assert happening. ODIN-W2-assert-pcap.zip

hmxmghl commented 4 years ago

Hi @kimlep01

We experience exactly the same thing. If we disable the DHCP (as it is is disabled in application I pasted above) there are no further crashes even if it keeps running connect/disconnect sequence for hours.

But if we enable the DHCP, we observe the assert at around 700-800 connect/disconnect sequence.

As there are no crashes when DHCP is disabled, we therefore think that the problem is in LWIP stack, as it is handling the DHCP negotiations.

kimlep01 commented 4 years ago

Hi @hamza-ubx

I would agree pointing the finger towards LWIP but then again we haven't seen issues like this with other Wifi modules. I will need to prove this by running the same test with other modules which I plan to do and report back.

In any case I would like see your fixes added to mbed-os as it will clearly improve the current situation where we are seeing asserts during every night. We can work forward from there.

AmmadRehmatCUS commented 4 years ago

@kimlep01 have you tried other modules with DHCP mode on?

kimlep01 commented 4 years ago

@AmmadRehmat Yes we run these tests always with DHCP on. Now running overnight test with a K64F+ESP8266 shield and I will have results tomorrow.

kimlep01 commented 4 years ago

Testing took a little bit longer than expected. I first tested K64F+ESP8266 one loop of 2000 connect-disconnect but that was painfully slow and took over 24 hours to finish and did not work properly - mostly just failing to connect in the end.

Then I switched to use ST board DISCO_L475VG_IOT01A with wifi-ism43362 driver. This driver worked more like the ODIN driver (even a bit faster). At first try it ran 14 loops of 2000 connect-disconnect successfully but then failed quickly the remaining 6 loops like this:

mbedgt: :38::FAIL: Expected 0 Was -3004
[1574101037.65][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1574101037.68][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1574101037.71][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1574101037.74][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1574101037.79][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1574101037.82][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1574101037.85][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1574101037.88][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1574101037.92][CONN][RXD] :38::FAIL: Expected 0 Was -3004
mbedgt: :40::FAIL: Expected 0 Was -3004
[1574101037.95][CONN][RXD] :40::FAIL: Expected 0 Was -3004
mbedgt: :38::FAIL: Expected 0 Was -3004
[1574101037.98][CONN][RXD] :38::FAIL: Expected 0 Was -3004

I ran the test again and this time the board finished 20 loops of 2000 connect-disconnect without any failures. Run time was 19216 seconds.

I don't know if this symptom of starting to fail all rest of the connection attempts quickly is related to this issue.

All tests were done against mbed-os master with only increasing the amount of connect-disconnects in the test case from 10 to 2000 and then running just the connect-disconnect test case 20 times. I didn't touch any DHCP settings.

hmxmghl commented 4 years ago

Hi @kimlep01

Thank you for sharing the detailed analysis.

This is the function in LWIP Interface where ODIN W2 fails to get the IP and hence asserts.

If other targets you tested are making use of the same LWIP Interface, may be you can debug from this point to see if the failures in your recent tests on other targets are caused here.

Also, our PR for the optimizations made during debugging this issue is on the way.

kimlep01 commented 4 years ago

Thank you @hamza-ubx I will continue debugging the issue further. Nice to hear the PR is on the way.

hmxmghl commented 4 years ago

@kimlep01 PR Created with the updates made to fix this issue

michalpasztamobica commented 4 years ago

Hi @hamza-ubx . Thanks a lot for your efforts so far and the helpful PR, which improved the situation. @kimlep01 was however still able to spot the assertions (although less often) and I was just checking your suggestion about the LWIP's failure.

I added some more logging, modified the wifi-connect-disconnect-repeat test as Kimmo suggested (2000 loops) and used the unreliable non-RF-BOX RAAS environment to reproduce the issue.

From time to time connect fails with AUTH_FAILURE. I can see this log line in the logs like this:

[1577774895.36][CONN][RXD] TIMEOUT: 20000 ms, ACTUAL 20200 ms
mbedgt: :48::FAIL: Expected 0 Was -3011
[1577774895.46][CONN][RXD] :48::FAIL: Expected 0 Was -3011
mbedgt: :50::FAIL: Expected 0 Was -3004
[1577774895.56][CONN][RXD] :50::FAIL: Expected 0 Was -3004
[1577774916.30][CONN][RXD] TIMEOUT: 20000 ms, ACTUAL 20100 ms
mbedgt: :48::FAIL: Expected 0 Was -3011
[1577774916.30][CONN][RXD] :48::FAIL: Expected 0 Was -3011
mbedgt: :50::FAIL: Expected 0 Was -3004
[1577774916.40][CONN][RXD] :50::FAIL: Expected 0 Was -3004
[1577774940.59][CONN][RXD] TIMEOUT: 20000 ms, ACTUAL 20011 ms
mbedgt: :48::FAIL: Expected 0 Was -3011
[1577774940.59][CONN][RXD] :48::FAIL: Expected 0 Was -3011
mbedgt: :50::FAIL: Expected 0 Was -3004
[1577774940.59][CONN][RXD] :50::FAIL: Expected 0 Was -3004

-3011 is NSAPI_ERROR_AUTH_FAILURE, so I would say it is worth checking what the actual problem is: timeout or authentication failure. In a polluted network both are credible. But I cannot look into the driver to see what exactly happened and they both originate from the driver: the error reason is set here and the timeout callback obviously comes from driver. I would suspect that the translation to mbed's error code is incorrect at some stage. On each timeout entry_connect_fail_wait_disconnect is entered, but the cbWLAN_disconnect() always returns cbSTATUS_OK and assertion does not trigger. After that wifi->disconnect() fails with NSAPI_ERROR_NO_CONNECTION, which is reasonable. This may happen a few times in a row with no assertion, and then the test may continue successfully for many loops more.

Usually however, the next time we try to connect, after previous timeout/authentication error the connect goes into the assertion in entry_connect_fail_wait_disconnect() (entered from the timeout handler, like previously). I do not see why this time cbWLAN_disconnect() returns cbSTATUS_ERROR and I need to seek your help @hamza-ubx to figure this out (or I would need access to driver's code).

To sum up: 1) I can't see why cbWLAN_disconnect() usually (not always) returns cbSTATUS_ERROR leading to assertion in entry_connect_fail_wait_disconnect(). 2) I also don't see why we can see timeout callback and NSAPI_ERROR_AUTH_FAILURE being reported at the same time. They both somehow originate from driver's internals, so your help would be much appreciated.

I don't think it is LWIP not being able to set ip address that is the root cause. It is OK for the module to sometimes fail to connect to AP in a busy network and this should not lead to assertion.

hmxmghl commented 4 years ago

Hi @michalpasztamobica

Thank you for the detailed analysis. We are continuing the investigation at our end and will share any information that we discover.

hmxmghl commented 4 years ago

Hi @michalpasztamobica

So we have investigated the issue in more detail. For the first part where you mentioned that may be translation from driver to Mbed's error code is not correct. When driver returns one of the following, only then it is translated to NSAPI_ERROR_AUTH_FAILURE:

For the connection timeout, its not handled at driver level instead it is being handled at interface level. See here. Hence, we can correct this error translation at interface level.

For the second part where you mentioned that assert is still reproducible but relatively rare now and it mostly occurs when the previous iteration of connect/disconnect fails with -3011. We tried to reproduce this scenario but were unable to replicate. Whenever -3011 occurred, next iteration succeeded usually.

However, assert did occur sometimes around consecutive 800 to 1200 connect/disconnect cycles. But the previous iteration after which the assert occurred was always successful connection/disconnection.

For assert part, we still haven't found a concrete scenario under which it is absolutely reproducible. It just occurs randomly sometimes in 100's and sometimes in 1000's connect/disconnect cycle and that too with only some of the APs (mobile phone access points mostly). Other APs work just fine and no assert have been observed upto 10,000 connect/disconnect cycles. Have you found the concrete scenario under which it can be reproduced?

michalpasztamobica commented 4 years ago

Hi @hamza-ubx . Not sure I got it right, but indeed, sounds like only cbWLAN_STATUS_DISCONNECTED_AUTH_FAILURE should be translated to NSAPI_ERROR_AUTH_FAILURE in here.

What about the two other errors: cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE and cbWLAN_STATUS_DISCONNECTED_MIC_FAILURE? Should they be translated to NSAPI_ERROR_TIMEOUT? I don't see any documentation on what they mean... ASSOC could stand for "assosciation" as "access point could not have been associated in time", but the MIC doesn't ring any bell to me... If you have an idea on how to best translate them, would you be able to prepare a PR with a fix? If not, can you please help me understand what these to errors mean and I'll find a NSAPI_ERROR_X code for them.

Regarding the original crash issue... I haven't tried any more since my last comment. Back then I could reproduce the issue within about an hour using the crowded RAAS lab. It also took me about a 1000 loops (sometimes less sometimes more). I could not crash the board when using my phone as an AP (Samsung Galaxy S7). The issue somewhat resembles this one: https://github.com/ARMmbed/mbed-os/issues/9621. @aqib-ublox managed to reproduce that issue by rapidly switching his AP on and off. Perhaps that helps to simulate poor connectivity/crowded network?

hmxmghl commented 4 years ago

I guess we can correct the error translation after we come up with appropriate meaning for each error code and find relevant NSAPI_ERROR_X

For cbWLAN_STATUS_DISCONNECTED_MIC_FAILURE, I think MIC stands for Message Integrity Check. So that is something that we can not categorize in NSAPI_ERROR_AUTH_FAILURE or NSAPI_ERROR_CONNECTION_TIMEOUT, in my opinion. What do you think the correct NSAPI_ERROR_X translation for this would be?

For cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE, in connection procedure, as association is done after the authentication step is complete, I don't think cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE should be categorized under NSAPI_ERROR_AUTH_FAILURE. But we can argue to put it under NSAPI_ERROR_CONNECTION_TIMEOUT as during association procedure, IP address is obtained so if it fails to obtain an IP within specified time, association failure will occur. What do you think?

For cbWLAN_STATUS_DISCONNECTED_AUTH_FAILURE, it can be put under NSAPI_ERROR_AUTH_FAILURE. Even though, timeout can occur during authentication procedure as well, I think it is still better to categorize it under NSAPI_ERROR_AUTH_FAILURE.

Regarding original crash issue, we are trying different things to reproduce the crash including the one mentioned in #9621. Hopefully this translation correction will also help analyze it better.

@michalpasztamobica

michalpasztamobica commented 4 years ago

Hi @hamza-ubx . I fully agree with these: cbWLAN_STATUS_DISCONNECTED_ASSOC_FAILURE -> NSAPI_ERROR_TIMEOUT cbWLAN_STATUS_DISCONNECTED_AUTH_FAILURE -> NSAPI_ERROR_AUTH_FAILURE

Regarding cbWLAN_STATUS_DISCONNECTED_MIC_FAILURE (Message Integrity Check). Normally if the code doesn't fit anything known, we would translate it to NSAPI_ERROR_DEVICE_ERROR and just leave it to the application to decide how to handle it. But if this issue is not that critical and if it only occurs on (dis)connection, as the handler's name suggests, we could translate it to NSAPI_ERROR_NO_CONNECTION (same as cbWLAN_STATUS_DISCONNECTED_UNKNOWN).

While we are at this point, I am also thinking if cbWLAN_STATUS_DISCONNECTED_NO_BSSID_FOUND should not be translated to NSAPI_ERROR_NO_SSID?

It's also wise to ask @kjbracey-arm for an opinion to make sure we don't get something horribly wrong. Kevin, would you please let us know if these error code translations mentioned above make sense to you?

@hamza-ubx , fingers crossed for the crash fix. If there is anything we can help with - let us know. For the reproduction scenario, I guess we are on the same page - the busier network, the weaker signal and the more extreme its handling and location, the more likely the crash becomes. I guess once per hour is not that bad...

hmxmghl commented 4 years ago

Great! We will look forward to hear from Kevin and then make the corrections to the translation as discussed.

About the cbWLAN_STATUS_DISCONNECTED_NO_BSSID_FOUND, I think user will see it as misleading if it is translated to NSAPI_ERROR_NO_SSID because it kind of depicts that SSID wasn't found in scan. Right now it is mapped to NSAPI_ERROR_NO_CONNECTION, which in my opinion is more appropriate.

And yeah I guess we are on the same page for the crash issue reproduction. Hoping to find concrete scenario to reproduce and then find a fix.

michalpasztamobica commented 4 years ago

Hi @hamza-ubx , any success regarding the crash issue? @kimlep01 , I looked briefly at the nightly results, but couldn't see the crash in there. Have you applied some workaround? Is the issue still visible?

kimlep01 commented 4 years ago

Hi, the original issue is still visible in our CI. Happened twice last night in GCC_ARM-tests-netsocket-dns and in GCC_ARM tests-network-wifi.

[1584754808.89][CONN][RXD] 
[1584754808.99][CONN][RXD] ++ MbedOS Error Info ++
[1584754808.99][CONN][RXD] Error Status: 0x80FF0144 Code: 324 Module: 255
[1584754809.07][CONN][RXD] Error Message: Assertion failed: error_code == cbSTATUS_OK
[1584754809.07][CONN][RXD] Location: 0x802268D
[1584754809.23][CONN][RXD] File: ./targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/TARGET_MODULE_UBLOX_ODIN_W2/sdk/ublox-odin-w2-drivers/OdinWiFiInterface.cpp+733
[1584754809.27][CONN][RXD] Error Value: 0x0
[1584754809.39][CONN][RXD] Current Thread: application_unnamed_thread Id: 0x200065D4 Entry: 0x8026509 StackSize: 0x1000 StackMem: 0x200122F8 SP: 0x20013240 
[1584754809.45][CONN][RXD] For more info, visit: https://mbed.com/s/error?error=0x80FF0144&osver=999999&core=0x410FC241&comp=2&ver=90200&tgt=UBLOX_EVK_OD...
[1584754809.45][CONN][RXD] 
[1584754809.45][CONN][RXD] -- MbedOS Error Info --
[1584754809.45][CONN][RXD] 
[1584754809.62][CONN][RXD] = System will be rebooted due to a fatal error =
hmxmghl commented 4 years ago

Hi @michalpasztamobica

We are seeing this issue occur relatively less frequently now since the last optimizations have been applied. However, issue is still there and we would like to properly fix it. This is a tricky one and requires dedicated effort. We are prioritizing it and will keep you posted as soon as we have a fix.