espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.62k stars 7.28k forks source link

espNOW causes esp_transport_connect to fail (IDFGH-8292) #9775

Closed nicolas-Philips closed 6 months ago

nicolas-Philips commented 2 years ago

Answers checklist.

IDF version.

v4.4.2

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

No response

Development Kit.

ESP-WROVER-KIT

Power Supply used.

USB

What is the expected behavior?

Hello,

I have a project with:

I created a new minimum code project (https://github.com/nicolas-Philips/espNowIssue) with:

When calling esp_now_send, data is still received on the lwip socket by azureIot, but not all data, causing azureIot to timeout. The expected behaviour is that esp_now_send does not influence lwip sockets.

What is the actual behavior?

Once the function esp_now_send is called, the esp32 always times out when trying to connect to azureIot. The output with embedtls debugging info enabled when the connection attempt times out:

I (5031) connTest: Start connTest example
I (5071) mbedtls: ssl_tls.c:5904 => handshake

I (5071) mbedtls: ssl_cli.c:4483 client state: 0

I (5071) mbedtls: ssl_msg.c:2102 => flush output

I (5081) mbedtls: ssl_msg.c:2114 <= flush output

I (5081) mbedtls: ssl_cli.c:4483 client state: 1

I (5091) mbedtls: ssl_msg.c:2102 => flush output

I (5091) mbedtls: ssl_msg.c:2114 <= flush output

I (5101) mbedtls: ssl_cli.c:999 => write client hello

I (5111) mbedtls: ssl_msg.c:2542 => write handshake message

I (5111) mbedtls: ssl_msg.c:2701 => write record

I (5121) mbedtls: ssl_msg.c:2102 => flush output

I (5121) mbedtls: ssl_msg.c:2122 message length: 295, out_left: 295

I (5131) mbedtls: ssl_msg.c:2127 ssl->f_send() returned 295 (-0xfffffed9)

I (5141) mbedtls: ssl_msg.c:2155 <= flush output

...... a lote more output .....

I (7091) mbedtls: ssl_tls.c:3642 <= write finished

I (7101) mbedtls: ssl_cli.c:4483 client state: 12

I (7111) mbedtls: ssl_msg.c:2102 => flush output

I (7111) mbedtls: ssl_msg.c:2114 <= flush output

I (7121) mbedtls: ssl_msg.c:5035 => parse change cipher spec

I (7121) mbedtls: ssl_msg.c:3941 => read record

I (7131) mbedtls: ssl_msg.c:1886 => fetch input

I (7131) mbedtls: ssl_msg.c:2043 in_left: 0, nb_want: 5

I (12141) mbedtls: ssl_msg.c:2068 in_left: 0, nb_want: 5

I (12141) mbedtls: ssl_tls.c:5915 <= handshake

W (12141) esp-tls: Failed to open new connection in specified timeout
E (12141) TRANSPORT_BASE: Failed to open a new connection
E (12151) connTest: Failed establishing TLS connection (esp_transport_connect failed)
I (12161) connTest: Finished connTest example

=> ssl_msg.c:2068 in_left: 0, nb_want: 5 in client state 12.

When we do not call esp_now_send, we can connect and disconnect thousands of times without a single error/timeout.

Steps to reproduce.

  1. git clone https://github.com/nicolas-Philips/espNowIssue.git
  2. In the sdkconfig: configure the following:
  1. build and run the code, you should get timeouts trying to connect to azure after the example_espnow_task is running (look for I (6605) espnow_example: Start sending broadcast data in the monitor.
  2. comment lines 131-135 in espnow_example_main.c
  3. Test that the example now works

Debug Logs.

Without any changes to the repository (and mbedtls info debug enabled):

I (5031) connTest: Start connTest example
I (5071) mbedtls: ssl_tls.c:5904 => handshake

I (5071) mbedtls: ssl_cli.c:4483 client state: 0

I (5071) mbedtls: ssl_msg.c:2102 => flush output

I (5081) mbedtls: ssl_msg.c:2114 <= flush output

I (5081) mbedtls: ssl_cli.c:4483 client state: 1

I (5091) mbedtls: ssl_msg.c:2102 => flush output

I (5091) mbedtls: ssl_msg.c:2114 <= flush output

I (5101) mbedtls: ssl_cli.c:999 => write client hello

I (5111) mbedtls: ssl_msg.c:2542 => write handshake message

I (5111) mbedtls: ssl_msg.c:2701 => write record

I (5121) mbedtls: ssl_msg.c:2102 => flush output

I (5121) mbedtls: ssl_msg.c:2122 message length: 295, out_left: 295

I (5131) mbedtls: ssl_msg.c:2127 ssl->f_send() returned 295 (-0xfffffed9)

I (5141) mbedtls: ssl_msg.c:2155 <= flush output

...... a lote more output .....

I (7091) mbedtls: ssl_tls.c:3642 <= write finished

I (7101) mbedtls: ssl_cli.c:4483 client state: 12

I (7111) mbedtls: ssl_msg.c:2102 => flush output

I (7111) mbedtls: ssl_msg.c:2114 <= flush output

I (7121) mbedtls: ssl_msg.c:5035 => parse change cipher spec

I (7121) mbedtls: ssl_msg.c:3941 => read record

I (7131) mbedtls: ssl_msg.c:1886 => fetch input

I (7131) mbedtls: ssl_msg.c:2043 in_left: 0, nb_want: 5

I (12141) mbedtls: ssl_msg.c:2068 in_left: 0, nb_want: 5

I (12141) mbedtls: ssl_tls.c:5915 <= handshake

W (12141) esp-tls: Failed to open new connection in specified timeout
E (12141) TRANSPORT_BASE: Failed to open a new connection
E (12151) connTest: Failed establishing TLS connection (esp_transport_connect failed)
I (12161) connTest: Finished connTest example

With lines 131-135 in espnow_example_main.c (esp_now_send) commented out:

I (19845) connTest: Connected to server
I (19855) connTest: Finished connTest example
I (20855) connTest: Start connTest example
I (22495) connTest: Connected to server
I (22505) connTest: Finished connTest example
I (23505) connTest: Start connTest example
I (25055) connTest: Connected to server
I (25065) connTest: Finished connTest example
I (26065) connTest: Start connTest example
I (27635) connTest: Connected to server
I (27645) connTest: Finished connTest example
I (28645) connTest: Start connTest example
I (30275) connTest: Connected to server
I (30275) connTest: Finished connTest example
I (31275) connTest: Start connTest example
I (32935) connTest: Connected to server
I (32945) connTest: Finished connTest example
I (33945) connTest: Start connTest example
I (35505) connTest: Connected to server
I (35505) connTest: Finished connTest example
I (36505) connTest: Start connTest example
I (38155) connTest: Connected to server
I (38165) connTest: Finished connTest example
I (39165) connTest: Start connTest example
I (40815) connTest: Connected to server
I (40825) connTest: Finished connTest example
I (41825) connTest: Start connTest example
I (43475) connTest: Connected to server
I (43485) connTest: Finished connTest example
I (44485) connTest: Start connTest example
I (46135) connTest: Connected to server
I (46145) connTest: Finished connTest example
I (47145) connTest: Start connTest example
I (48795) connTest: Connected to server
I (48805) connTest: Finished connTest example
I (49805) connTest: Start connTest example
I (51465) connTest: Connected to server
I (51475) connTest: Finished connTest example
I (52475) connTest: Start connTest example
I (54125) connTest: Connected to server
I (54125) connTest: Finished connTest example
I (55125) connTest: Start connTest example
I (56785) connTest: Connected to server
I (56795) connTest: Finished connTest example
I (57795) connTest: Start connTest example
I (59445) connTest: Connected to server
I (59455) connTest: Finished connTest example
I (60455) connTest: Start connTest example
I (62105) connTest: Connected to server
I (62115) connTest: Finished connTest example
I (63115) connTest: Start connTest example
....

### More Information.

The code that we use to connect to azureIot, that works without issues without calling esp_now_send and stops working when esp_now_send gets called:

static const char *TAG = "connTest";

static esp_transport_list_handle_t transportList;

static void conntest_task(void *pvparameters) { while(1) { ESP_LOGI(TAG, "Start connTest example");

    esp_transport_handle_t xTransport = esp_transport_list_get_transport(transportList, "_conntest");

    esp_transport_ssl_set_cert_data_der( xTransport, ( const char * ) DEF_ROOT_CA, sizeof(DEF_ROOT_CA));
    if(esp_transport_connect( xTransport, "jaga-dev-iothubdeviceprovisioning.azure-devices-provisioning.net", 8883, 5000 ) < 0 )
    {
        ESP_LOGE( TAG, "Failed establishing TLS connection (esp_transport_connect failed)" );
    }
    else
        ESP_LOGI( TAG, "Connected to server");

    esp_transport_close( xTransport );

    ESP_LOGI(TAG, "Finished connTest example");

    vTaskDelay(1000 / portTICK_PERIOD_MS);
}

}

void startConnTest() {

transportList = esp_transport_list_init();

/* create new transport to the transport list */
esp_transport_handle_t ssl = esp_transport_ssl_init();
if(ssl == NULL)
{
    esp_restart();
}
esp_transport_list_add(transportList, ssl, "_conntest");

xTaskCreate(&conntest_task, "test_task", 8192, NULL, 5, NULL);

}

nicolas-Philips commented 2 years ago

In short: using espNOW send causes LWIP socket to stop functioning correctly. It is not just connecting connecting as mentioned in the example code that starts failing, but everything using LWIP sockets.

zhangyanjiaoesp commented 2 years ago

@nicolas-Philips Have you enable the SPIRAM option in the menuconfig? Can you provide some sniffer captures ?

nicolas-Philips commented 2 years ago

@zhangyanjiaoesp Since anything not specifically reconfigured in the SDK is the default config (see step 2 of Steps to reproduce), the SPIRAM option was not set.

I added two wireshark captures, one without espNow, where the connetion just keeps being created and closed (rather short capture). And one where an ESP packet is sent after about 16s (in the wireshark capture time). From then on, the ESP does not send any packet anymore and azure sends multiple [FIN, ACK] requests, trying to close the connection, without a response of the ESP. From then on the ESP just does not send out any packet anymore.

wireshareCaptures.zip

zhangyanjiaoesp commented 2 years ago

@nicolas-Philips I have tested locally using your example, I can reproduce the problem on v4.4.2 branch, but when I change to master branch and v5.0 branch, the test pass.

I (674) wifi:Init data frame dynamic rx buffer num: 32 I (684) wifi:Init management frame dynamic rx buffer num: 32 I (684) wifi:Init management short buffer num: 32 I (694) wifi:Init dynamic tx buffer num: 32 I (694) wifi:Init static rx buffer size: 1600 I (704) wifi:Init static rx buffer num: 10 I (704) wifi:Init dynamic rx buffer num: 32 I (704) wifi_init: rx ba win: 6 I (714) wifi_init: tcpip mbox: 32 I (714) wifi_init: udp mbox: 6 I (714) wifi_init: tcp mbox: 6 I (724) wifi_init: tcp tx win: 5744 I (724) wifi_init: tcp rx win: 5744 I (734) wifi_init: tcp mss: 1440 I (734) wifi_init: WiFi IRAM OP enabled I (744) wifi_init: WiFi RX IRAM OP enabled I (744) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07 I (844) wifi:mode : sta (30:ae:a4:80:15:c8) I (844) wifi:enable tsf I (844) wifi station: wifi_init_sta finished. I (864) wifi:new:<6,0>, old:<1,0>, ap:<255,255>, sta:<6,0>, prof:1 I (864) wifi:state: init -> auth (b0) I (874) wifi:state: auth -> assoc (0) I (884) wifi:state: assoc -> run (10) I (904) wifi:connected with test_zyj, aid = 2, channel 6, BW20, bssid = ac:84:c6:7e:31:2b I (904) wifi:security: WPA2-PSK, phy: bgn, rssi: -22 I (904) wifi:pm start, type: 1

W (914) wifi:idx:0 (ifx:0, ac:84:c6:7e:31:2b), tid:0, ssn:0, winSize:64 I (924) wifi:AP's beacon interval = 102400 us, DTIM period = 1 I (1634) esp_netif_handlers: sta ip: 192.168.0.191, mask: 255.255.255.0, gw: 192.168.0.1 I (1634) wifi station: got ip:192.168.0.191 I (1634) wifi station: connected to ap SSID:test_zyj password:12345678 I (1644) ESPNOW: espnow [version: 1.0] init I (1644) connTest: Start connTest example W (2094) wifi:idx:1 (ifx:0, ac:84:c6:7e:31:2b), tid:1, ssn:0, winSize:64 I (3924) connTest: Connected to server I (3924) connTest: Finished connTest example I (4934) connTest: Start connTest example I (6644) espnow_example: Start sending broadcast data I (6644) espnow_example: Receive 2th broadcast data from: 30:ae:a4:0c:34:ec, len: 10 I (6644) espnow_example: Receive 6th broadcast data from: 30:ae:a4:0c:34:ec, len: 10 I (7114) connTest: Connected to server I (7114) connTest: Finished connTest example I (7664) espnow_example: send data to ff:ff:ff:ff:ff:ff I (7664) espnow_example: Receive 7th broadcast data from: 30:ae:a4:0c:34:ec, len: 10 I (8114) connTest: Start connTest example I (8694) espnow_example: send data to ff:ff:ff:ff:ff:ff I (9694) espnow_example: send data to ff:ff:ff:ff:ff:ff I (9694) espnow_example: Receive 9th broadcast data from: 30:ae:a4:0c:34:ec, len: 10 I (9694) espnow_example: Start sending unicast data I (9694) espnow_example: send data to 30:ae:a4:0c:34:ec I (10154) connTest: Connected to server I (10154) connTest: Finished connTest example I (10754) espnow_example: send data to 30:ae:a4:0c:34:ec I (11164) connTest: Start connTest example I (11754) espnow_example: send data to 30:ae:a4:0c:34:ec I (12824) espnow_example: send data to 30:ae:a4:0c:34:ec I (13444) connTest: Connected to server I (13444) connTest: Finished connTest example I (13934) espnow_example: send data to 30:ae:a4:0c:34:ec I (14444) connTest: Start connTest example I (15024) espnow_example: send data to 30:ae:a4:0c:34:ec I (16044) espnow_example: send data to 30:ae:a4:0c:34:ec I (16504) connTest: Connected to server I (16514) connTest: Finished connTest example

zhangyanjiaoesp commented 7 months ago

@nicolas-Philips Can you update your IDF branch and have a try?

Sherry616 commented 6 months ago

Thanks for reporting, will close due to short of feedback, feel free to reopen with more updates.