eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
246 stars 136 forks source link

sample_azure_iot_embedded_sdk_with_retry.c returns MQTT CONNECT FAIL status:65541 #86

Closed dsilva-vd closed 2 years ago

dsilva-vd commented 2 years ago

NETXDUO 6.1.7 - Tied to latest NETX distributed by STM32Cube IDE.

Using sample_azure_iot_embedded_sdk_with_retry.c I am consistently getting MQTT CONNECT FAIL status:65541 (0x10005) when trying to connect to my IOT Hub. The IOT hub is available as other clients (Python Azure SDK) can connect to the Hub). Is there a way to turn on more debugging or some idea of where to start debugging this.

Nx_SNTP_Client application started.. STM32 IpAddress: XXX.XXX.XXX.XXX SNTP client connected to NTP server : < ca.pool.ntp.org >

SNTP update : Mar 23, 2022 22:8:18.859 UTC

23-03-2022 / 22:08:18 [INFO] Azure IoT Security Module has been enabled, status=0 IoTHub Host Name: Myazurehub.azure-devices.net; Device ID: mydevice. [ERROR] IoTHub client connect fail: MQTT CONNECT FAIL status: 65541 Disconnected from IoTHub!: error code = 0x00010005 Failed on nx_azure_iot_hub_client_connect! reconnecting iothub, after backoff [ERROR] IoTHub client connect fail: MQTT CONNECT FAIL status: 65541 Disconnected from IoTHub!: error code = 0x00010005 reconnecting iothub, after backoff [ERROR] IoTHub client connect fail: MQTT CONNECT FAIL status: 65541 Disconnected from IoTHub!: error code = 0x00010005 reconnecting iothub, after backoff

dsilva-vd commented 2 years ago

I have also tried the secondary SAS key and the results are the same.

dsilva-vd commented 2 years ago

app_netxduo_c.zip I've included my app_netxduo.c file with the tx_application_define()

dsilva-vd commented 2 years ago

Forgot to attach the console output

` Nx_SNTP_Client application started.. STM32 IpAddress: XXX.XXX.X.XXX SNTP client connected to NTP server : < fr.pool.ntp.org >

SNTP update : Mar 24, 2022 15:29:10.374 UTC

24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 [INFO] Azure IoT Security Module has been enabled, status=0 IoTHub Host Name: myhub.azure-devices.net; Device ID: stm32F429. 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:10 24-03-2022 / 15:29:12 24-03-2022 / 15:29:12 24-03-2022 / 15:29:12 24-03-2022 / 15:29:13 24-03-2022 / 15:29:13 24-03-2022 / 15:29:13 24-03-2022 / 15:29:14 24-03-2022 / 15:29:15 24-03-2022 / 15:29:16 24-03-2022 / 15:29:16 24-03-2022 / 15:29:16 24-03-2022 / 15:29:17 24-03-2022 / 15:29:18 24-03-2022 / 15:29:19 24-03-2022 / 15:29:19 24-03-2022 / 15:29:19 24-03-2022 / 15:29:19 24-03-2022 / 15:29:20 24-03-2022 / 15:29:21 [ERROR] IoTHub client connect fail: MQTT CONNECT FAIL status: 65541 Disconnected from IoTHub!: error code = 0x00010005 Failed on nx_azure_iot_hub_client_connect! reconnecting iothub, after backoff 24-03-2022 / 15:29:22 24-03-2022 / 15:29:22 24-03-2022 / 15:29:22 24-03-2022 / 15:29:23 24-03-2022 / 15:29:24 24-03-2022 / 15:29:25 24-03-2022 / 15:29:25 24-03-2022 / 15:29:25 24-03-2022 / 15:29:25 24-03-2022 / 15:29:25 24-03-2022 / 15:29:26 24-03-2022 / 15:29:28 24-03-2022 / 15:29:28 24-03-2022 / 15:29:28 24-03-2022 / 15:29:28 24-03-2022 / 15:29:29 24-03-2022 / 15:29:30 24-03-2022 / 15:29:31 24-03-2022 / 15:29:31 24-03-2022 / 15:29:31 24-03-2022 / 15:29:32 24-03-2022 / 15:29:33 24-03-2022 / 15:29:33 24-03-2022 / 15:29:34 24-03-2022 / 15:29:34 24-03-2022 / 15:29:34 24-03-2022 / 15:29:35 `

dsilva-vd commented 2 years ago

The function nx_secure_tls_session_start (line 3939, file nxd_mqtt_client.c) returns a value of 56. I cannot find a #define for what this value means.

TiejunMS commented 2 years ago

Hi @dsilva-vd , I saw there are only 10 packets in the pool. What is the value of RX BD? One simple test you can try is to increase packet pool count from 10 to 30 and see if the problem is solved.

dsilva-vd commented 2 years ago

Hi @TiejunMS I increased NX_APP_MEM_POOL_SIZE to 70kBytes and increased the packet pool count to 30. This didn't change the behavior.

Just to confirm these are the changes I made: `

define PAYLOAD_SIZE 1536

define NX_PACKET_POOL_NUM_PACKETS 30

define NX_PACKET_POOL_SIZE (( PAYLOAD_SIZE + sizeof(NX_PACKET)) * NX_PACKET_POOL_NUM_PACKETS)

`

PACKET_POOL_SIZE is used to allocate the packet pool

What is RX BD? Can you point me to a file or name of define I can search for? Thanks

TiejunMS commented 2 years ago

If you are using official driver from ST, then the value is defined here. Since the problem is not solved after increasing the packet pool, it is not due to lack of packet. Are you able to capture packet trace by wireshark and share with us? It is helpful to understand what happens during the connection.

dsilva-vd commented 2 years ago

Hi I will try and capture a Wireshark but as the device is a wired device I need to find a way to do a man in the middle listening.

The RX BD defines are as follows:

`

define ETH_MAX_PACKET_SIZE 1524U /!< ETH_HEADER + ETH_EXTRA + ETH_VLAN_TAG + ETH_MAX_ETH_PAYLOAD + ETH_CRC /

define ETH_RX_BUF_SIZE ETH_MAX_PACKET_SIZE / buffer size for receive /

define ETH_TX_BUF_SIZE ETH_MAX_PACKET_SIZE / buffer size for transmit /

define ETH_RXBUFNB 4U / 4 Rx buffers of size ETH_RX_BUF_SIZE /

define ETH_TXBUFNB 4U / 4 Tx buffers of size ETH_TX_BUF_SIZE /

`

dsilva-vd commented 2 years ago

ST32 Ethernet - bridge mode adapter.zip Hi @TiejunMS Here is a wireshark capture. The ST32 device has address 192.168.0.126. The network connectivity is bridged through 192.168.0.143 (which was running Wireshark).

I don't see any packets from the the ST32 device indicating it is starting the MQTT connection to the Azure IoT Hub. I await your thoughts on reviewing the wireshark capture. Thanks

TiejunMS commented 2 years ago

@dsilva-vd , it does not look like the bridge is working. Could you double check the bridge configurations?

Btw, if you would like to share your sample_config.h with us for checking, feel free to send it to azure-rtos-support@microsoft.com

dsilva-vd commented 2 years ago

@TiejunMS Here are the files for my iot client. I believe I renamed the sample_config.h to azure_iot_client_config.h
IoTClient.zip

I will look into the bridging configuration again tomorrow. Thanks

TiejunMS commented 2 years ago

@dsilva-vd , could you fill in the valid hostname, device_id and symmetric key in the config.h? Just want to double check the configurations are all correct. You can delete the device once it is verified. To avoid credential leak publicly, you can send it through support email listed above.

dsilva-vd commented 2 years ago

@TiejunMS I sent in the config file with credentials to the support email today. I'll keep working on the wireshark capture.

TiejunMS commented 2 years ago

@dsilva-vd , I have verified the credentials you shared are valid in our official sample. If you are using the same sample, then the issue might be related to Ethernet driver. Did you port the driver or it is provided by ST? Anyway, a packet trace will be helpful to show us what happened during the connection.

dsilva-vd commented 2 years ago

@TiejunMS The Ethernet driver is provided by ST in an example on setting up a simple SNTP client (using Ethernet). I took that example and after successfully connecting to SNTP server and getting a time update I start the Azure IoT embedded client thread. I can provide the full project via Github or zip but I imagine you would need an STM32F429 Nucleo 144 evaluation board to run the code. I'll try and collect a better wireshark capture. It is a bit tricky when dealing with Ethernet.

dsilva-vd commented 2 years ago

@TiejunMS I'm closing this issue. STM has released an AZure RTOS version of their cellular FW package and my end goal is azure connection with cellular PPP interface. The ethernet example was for temporary development. I will re-integrate the azure iot embedded client with the cellular FW package and open a new ticket if I encounter any issues.

Thanks

TiejunMS commented 2 years ago

Thanks for letting me know. Feel free to reopen the issue if needed.

dsilva-vd commented 2 years ago

@TiejunMS I need to reopen this issue.

I have been able to integrate the Azure IoT client (sample_azure_iot_embedded_skd_with_retry, Netxduo 6.1.7 release) with the Azure RTOS STM32 Cellular package.

The IoT client behavior is exactly the same as when I was running the client over Ethernet. The same function call returns the error value. The last time you helped troubleshoot this issue you wanted to see a Wireshark capture. That won't be simple with the cellular modem so is there another option to troubleshoot this?

I could purchase the same reference kit used in the Netx example code i.e. B-L475E-IOT01A2 (B-L475E-IOT01A1) and run the same example to capture Wireshark but as that example works I'm not sure what it would tell us.

if this is helpful I've attached a screenshot of the ThreadX thread list the moment the mqtt connect call is made. Perhaps there is a thread priority issue that you can see image

dsilva-vd commented 2 years ago

Forgot to reopen issue

TiejunMS commented 2 years ago

@dsilva-vd , since you were able to reproduce the issue on both Ethernet and Cellular interfaces, I think the issue is related to application. So could you capture the packets through Ethernet which should be much simpler?

dsilva-vd commented 2 years ago

Hi @TiejunMS I captured a wireshark capture using an passive network tap. I'm not sure I have the correct wireshark settings but here is the captured log. The device's IP is 192.168.0.138

dsilva-vd commented 2 years ago

Updated capture. The first one was not captured properly and I deleted it. I can see from the logs there is a TLS failure (Fatal Alert) Ethernet Azure IoT Hub capture.zip

TiejunMS commented 2 years ago

From the packet trace (only data from device), there is an TLS error for unknow CA. It means the certificate chain validation fails. There are three CA certificates used by IoTHub, see example here. Could you check if they are all added in your current example?

dsilva-vd commented 2 years ago

Hi @TiejunMS ,

I had removed the 1st cert as I saw the comment that MS was going to transition to Digicert and removed it. After adding the cert back in the Ethernet example worked. I also found on the ethernet sample that I needed a larger packet pool for Net X.

I made the same changes (adding back in root cert 1) on my cellular example and successfully connected there as well.

Thank you for your help!

TiejunMS commented 2 years ago

Good to know it is working!

bo-ms commented 2 years ago

Hi @dsilva-vd, the start date of the Azure IoT root certificate migration from June1st 2022, to no earlier than Feb 15th, 2023. you may check here: https://techcommunity.microsoft.com/t5/internet-of-things-blog/azure-iot-tls-critical-changes-are-almost-here-and-why-you/ba-p/2393169