espressif / esp-aws-iot

AWS IoT SDK for ESP32 based chipsets
Apache License 2.0
256 stars 153 forks source link

mbedtls_net_connect() sometimes fails (CA-298) #187

Closed sdw-jro closed 9 months ago

sdw-jro commented 1 year ago

Hello,

We are having trouble with the library since upgrading to ESP-IDF v5.0. During the upgrade, we checked out the latest commit of the release/v3.1.x branch.

In about ~50% of connection process, the mbedtls_net_connect() function call fails. When the connection is successful, we can subscribe and post to MQTT topics.

Here is a sample failure log with lwIP socket logging enabled:

W (56) boot.esp32c3: eFuse virtual mode is enabled. If Secure boot or Flash encryption is enabled then it does not provide any security. FOR TESTING ONLY!
I (72) boot: ESP-IDF v5.0.2-dirty 2nd stage bootloader
I (72) boot: compile time 14:50:07
I (72) boot: chip revision: v0.3
I (75) boot.esp32c3: SPI Speed      : 40MHz
I (79) boot.esp32c3: SPI Mode       : DIO
I (84) boot.esp32c3: SPI Flash Size : 4MB
I (89) boot: Enabling RNG early entropy source...
W (195) efuse: Loading virtual efuse blocks from flash
EFUSE_BLKx:
0) 0x01800301 0x000f0602 0x49c40000 0x00100000 0x00000022 0x00000000 
1) 0xa1dad838 0x00007cdf 0x00000000 0x010c0000 0x5073e2e0 0x000642d2 
2) 0x9f805765 0xf2d59cd5 0x1927fc75 0xf7105ccb 0x5705ea41 0xfa6e1d81 0x1e220222 0x00000009
3) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 
4) 0x42239e4b 0xd31e8a7e 0x735b7740 0xa5c4beb6 0xfc0b58e3 0xd358bc4e 0x030117a9 0x1f743592
5) 0x2297aabf 0xbb9db5cd 0xd46ef7e1 0x5bd7e0d0 0xf6441d4f 0x4b7df83f 0xafeffba6 0x561b48d0 
6) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 
7) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 
8) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
9) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
10) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
I (287) boot: 11 efuse_em         efuse            01 05 003fb000 00002000
I (294) boot: End of partition table
I (298) BOOTLOADER START : boot: current boot index OTA_0
I (304) esp_image: segment 0: paddr=00020020 vaddr=3c100020 size=371b8h (225720) map
I (360) esp_image: segment 1: paddr=000571e0 vaddr=3fc95e00 size=0477ch ( 18300) 
I (364) esp_image: segment 2: paddr=0005b964 vaddr=40380000 size=046b4h ( 18100) 
I (369) esp_image: segment 3: paddr=00060020 vaddr=42000020 size=f6824h (1009700) map
I (585) esp_image: segment 4: paddr=0015684c vaddr=403846b4 size=11714h ( 71444) 
I (600) esp_image: segment 5: paddr=00167f68 vaddr=00000000 size=08068h ( 32872) 
I (607) esp_image: Verifying image signature...
I (608) secure_boot_v2: Verifying with RSA-PSS...
I (611) secure_boot_v2: Signature verified successfully!
I (614) esp_image: image_digest = 0xDF2E... verified_digest=0x2eDF...
I (621) BOOTLOADER START : atecc: Image metadata len 151000 @ addr 20000 !
I (662) CRYPTO DEVICE APP: initialization success
I (735) SB MEMORY: memory_params->start_address 0x20000
I (735) SB MEMORY: memory_params->memory_size  0x151000
I (2844) BOOTLOADER START : boot: app image verified. Jump to app
I (2845) esp_image: segment 0: paddr=00020020 vaddr=3c100020 size=371b8h (225720) map
I (2896) esp_image: segment 1: paddr=000571e0 vaddr=3fc95e00 size=0477ch ( 18300) load
I (2900) esp_image: segment 2: paddr=0005b964 vaddr=40380000 size=046b4h ( 18100) load
I (2906) esp_image: segment 3: paddr=00060020 vaddr=42000020 size=f6824h (1009700) map
I (3121) esp_image: segment 4: paddr=0015684c vaddr=403846b4 size=11714h ( 71444) load
I (3139) esp_image: segment 5: paddr=00167f68 vaddr=00000000 size=08068h ( 32872) 
I (3146) esp_image: Verifying image signature...
I (3146) secure_boot_v2: Verifying with RSA-PSS...
I (3150) secure_boot_v2: Signature verified successfully!
I (3153) esp_image: image_digest = 0xDF2E... verified_digest=0x2eDF...
I (3166) boot: Loaded app from partition at offset 0x20000
I (3167) secure_boot_v2: enabling secure boot v2...
I (3172) secure_boot_v2: secure boot v2 is already enabled, continuing..
I (3179) boot: Checking flash encryption...
I (3184) flash_encrypt: flash encryption is enabled (1 plaintext flashes left)
I (3192) boot: Disabling RNG early entropy source...
I (3209) cpu_start: Pro cpu up.
I (3220) cpu_start: Pro cpu start user code
I (3220) cpu_start: cpu freq: 80000000 Hz
I (3220) cpu_start: Application information:
I (3223) cpu_start: Project name:     xxxx
I (3229) cpu_start: App version:      982e988-dirty
I (3234) cpu_start: Compile time:     Jun  7 2023 13:45:16
I (3240) cpu_start: ELF file SHA256:  ba557063b243fc47...
I (3246) cpu_start: ESP-IDF:          v5.0.2-dirty
I (3252) cpu_start: Min chip rev:     v0.3
I (3257) cpu_start: Max chip rev:     v0.99 
I (3261) cpu_start: Chip rev:         v0.3
I (3266) heap_init: Initializing. RAM available for dynamic allocation:
I (3274) heap_init: At 3FCA9480 len 00033290 (204 KiB): DRAM
I (3280) heap_init: At 3FCDC710 len 00002950 (10 KiB): STACK/DRAM
I (3287) heap_init: At 50000020 len 00001FE0 (7 KiB): RTCRAM
I (3295) spi_flash: detected chip: generic
I (3298) spi_flash: flash io: dio
W (3302) cpu_start: eFuse virtual mode is enabled. If Secure boot or Flash encryption is enabled then it does not provide any security. FOR TESTING ONLY!
W (3318) efuse: Loading virtual efuse blocks from flash
EFUSE_BLKx:
0) 0x01800301 0x000f0602 0x49c40000 0x00100000 0x00000022 0x00000000
1) 0xa1dad838 0x00007cdf 0x00000000 0x010c0000 0x5073e2e0 0x000642d2 
2) 0x9f805765 0xf2d59cd5 0x1927fc75 0xf7105ccb 0x5705ea41 0xfa6e1d81 0x1e220222 0x00000009 
3) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
4) 0x42239e4b 0xd31e8a7e 0x735b7740 0xa5c4beb6 0xfc0b58e3 0xd358bc4e 0x030117a9 0x1f743592
5) 0x2297aabf 0xbb9db5cd 0xd46ef7e1 0x5bd7e0d0 0xf6441d4f 0x4b7df83f 0xafeffba6 0x561b48d0 
6) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 
7) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 
8) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
9) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
10) 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
W (3409) flash_encrypt: Flash encryption mode is DEVELOPMENT (not secure)
I (3418) coexist: coexist rom version 9387209
I (3422) cpu_start: Starting scheduler.
I (3716) gpio: GPIO[1]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 0| Pulldown: 0| Intr:0 
I (3716) atca_basic: init start
I (3786) atca_basic: init end success
I (3786) ATECC CTRL: init done
I (3846) COMMUNICATION APP: Serial number:
I (3846) COMMUNICATION APP: 01 23 01 cf 6c 04 a0 5b ee 
I (3846) BTDM_INIT: BT controller compile version [80abacd]
I (3846) phy_init: phy_version 950,11a46e9,Oct 21 2022,08:56:12
I (3896) system_api: Base MAC address is not set
I (3896) system_api: read default base MAC address from EFUSE
I (3896) BTDM_INIT: Bluetooth MAC: 7c:df:a1:da:d8:3a
I (4136) pp: pp rom version: 9387209
I (4136) net80211: net80211 rom version: 9387209
I (4186) wifi:wifi driver task: 3fcc5328, prio:23, stack:6656, core=0
I (4196) wifi:wifi firmware version: 57982fe
I (4196) wifi:wifi certification version: v7.0
I (4196) wifi:config NVS flash: enabled
I (4206) wifi:config nano formating: disabled
I (4206) wifi:Init data frame dynamic rx buffer num: 32
I (4216) wifi:Init management frame dynamic rx buffer num: 32
I (4216) wifi:Init management short buffer num: 32
I (4226) wifi:Init dynamic tx buffer num: 32
I (4226) wifi:Init static tx FG buffer num: 2
I (4236) wifi:Init static rx buffer size: 1600
I (4236) wifi:Init static rx buffer num: 10
I (4236) wifi:Init dynamic rx buffer num: 32
I (4266) gpio: GPIO[5]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (4276) main: all app stared successfully
I (4276) wifi_init: rx ba win: 6
I (4286) wifi_init: tcpip mbox: 32
I (4286) wifi_init: udp mbox: 32
I (4296) wifi_init: tcp mbox: 12
I (4296) wifi_init: tcp tx win: 5744
I (4296) wifi_init: tcp rx win: 5744
I (4306) wifi_init: tcp mss: 1440
I (4306) wifi_init: WiFi IRAM OP enabled
I (4316) wifi_init: WiFi RX IRAM OP enabled
I (4316) wifi:mode : sta (7c:df:a1:da:d8:38)
I (4326) wifi:enable tsf
I (4336) wifi:set country: cc=JPI schan=1 nchan=14 policy=0
I (4346) wifi:Set ps type: 1
I (8216) wifi:new:<13,0>, old:<1,0>, ap:<255,255>, sta:<13,0>, prof:1
I (9516) wifi:state: init -> auth (b0)
I (9526) wifi:state: auth -> assoc (0)
I (9536) wifi:state: assoc -> run (10)
I (9976) wifi:<ba-add>idx:0 (ifx:0, 50:eb:f6:63:6f:78), tid:0, ssn:28, winSize:64
I (9996) wifi:connected with ASUS_AC750, aid = 2, channel 13, BW20, bssid = 50:eb:f6:63:6f:78
I (9996) wifi:security: WPA2-PSK, phy: bgn, rssi: -32
I (9996) wifi:pm start, type: 1
I (10006) wifi:set rx beacon pti, rx_bcn_pti: 14, bcn_timeout: 14, mt_pti: 25000, mt_time: 10000
I (10056) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (11016) esp_netif_handlers: sta ip: 192.168.1.241, mask: 255.255.255.0, gw: 192.168.1.1
I (11016) WIFI CTRL: IP_EVENT_STA_GOT_IP
I (11016) WIFI CTRL: Initializing SNTP
I (14016) WIFI CTRL: Received timestamp from NTP server 1686141239 sec, 2023-06-07 12:33:59
I (14016) IOT AWS CTRL: Connecting...
I (14016) IOT AWS CTRL: Connection attempt 1...
lwip_socket(PF_INET, SOCK_STREAM, 6) = 48
lwip_connect(48, addr=34.246.98.169 port=8883)
lwip_connect(48) failed, err=-13
lwip_close(48)
E (32306) aws_iot: failed! mbedtls_net_connect returned -0x44
I (32306) IOT AWS CTRL: aws_iot_mqtt_connect() returned -24
I (32306) IOT AWS CTRL: Connection attempt 1 failed (ret=-24)
I (33316) IOT AWS CTRL: Connection attempt 2...
lwip_socket(PF_INET, SOCK_STREAM, 6) = 48
lwip_connect(48, addr=34.246.98.169 port=8883)
lwip_connect(48) failed, err=-13
lwip_close(48)
E (51836) aws_iot: failed! mbedtls_net_connect returned -0x44
I (51836) IOT AWS CTRL: aws_iot_mqtt_connect() returned -24
I (51836) IOT AWS CTRL: Connection attempt 2 failed (ret=-24)
I (52846) IOT AWS CTRL: Connection attempt 3...
lwip_socket(PF_INET, SOCK_STREAM, 6) = 48
lwip_connect(48, addr=34.246.98.169 port=8883)

How do you recommend we proceed to debug this? Thank you.

avsheth commented 1 year ago

Hi @sdw-jro We tried to reproduce this issue with the example app. But it got connected all the time. Can you also try to check with the example app and let know if that's working. If the example app works okay, could you print available memory when the connection fails ? Another thing to try would be to check with different router and see if same is observed?

sdw-jro commented 11 months ago

Hello @avsheth , When trying to replicate this issue again today, I noticed it was basically gone. We were able to connect to AWS consistently. The only thing that I can think of being different from a few weeks back is my development environment: I downgraded my Python from 3.11 to 3.8 (it broke compatibility with IDF v4.4 on my system) and reinstalled ESP IDF v5.

If this could be interesting for you to investigate, I can share more information. Otherwise, I will close the issue.