eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
242 stars 137 forks source link

Clients disconnecting from IoT Hub when publishing telemetry #104

Closed dsilva-vd closed 2 years ago

dsilva-vd commented 2 years ago

I am encountering client disconnects when publishing telemetry and have 2 clients (1 device and 1 module) publishing telemetry data. I have increased Packet pool by ~25% (40 -> 48 packets) and increased NX_AZURE_IOT_TLS_PACKET_BUFFER_SIZE from 7kB to 10kB and the problem still persists.

My application:

NetxDUO v 6.1.7 STM32F429 host Cellular network connection - x-cube-cellular 7.0.0 sample_azure_iot_embedded _sdk_with_retry .c - edited to connect 1 module and 1 device client

The console out put is attached and also a snippet is provided below (Base is the device IoT client, VTL-1 is the module IoT client:

26-05-2022 / 16:37:40
Connected to IoTHub. Client name: Base 
IoTHub Host Name: development1.azure-devices.net; Device ID: stm32F429_engg1.
Module ID: VTL-1.
26-05-2022 / 16:37:41
26-05-2022 / 16:37:41
26-05-2022 / 16:37:41
26-05-2022 / 16:37:42
26-05-2022 / 16:37:42
26-05-2022 / 16:37:43
26-05-2022 / 16:37:43
26-05-2022 / 16:37:43
26-05-2022 / 16:37:44
[INFO] Azure IoT Security Module message is empty
Connected to IoTHub. Client name: VTL-1 
Disable Pre-emption for monitoring!: Thread Name: Demo
26-05-2022 / 16:37:45
26-05-2022 / 16:37:46
26-05-2022 / 16:37:46
26-05-2022 / 16:37:47
26-05-2022 / 16:37:48
Telemetry message send: Client: VTL-1, msg: {"Message ID":0}.
26-05-2022 / 16:37:49
26-05-2022 / 16:37:50
26-05-2022 / 16:37:51
26-05-2022 / 16:37:52
26-05-2022 / 16:37:53
Telemetry message send: Client: Base, msg: {"Message ID":1}.
Disconnected from IoTHub!: Client: Base, error code = 0x0002000c
Restoring Pre-emption for monitoring!: Thread Name: Demo
reconnecting iothub, after backoff
26-05-2022 / 16:37:54
26-05-2022 / 16:37:55
26-05-2022 / 16:37:56
26-05-2022 / 16:37:56
26-05-2022 / 16:37:57
26-05-2022 / 16:37:57
26-05-2022 / 16:37:58
Telemetry message send: Client: VTL-1, msg: {"Message ID":2}.
26-05-2022 / 16:37:58
26-05-2022 / 16:37:58
26-05-2022 / 16:37:59
26-05-2022 / 16:38:00
26-05-2022 / 16:38:01
Connected to IoTHub. Client name: Base 
Disable Pre-emption for monitoring!: Thread Name: Demo
26-05-2022 / 16:38:02
Telemetry message send: Client: Base, msg: {"Message ID":3}.
26-05-2022 / 16:38:03
26-05-2022 / 16:38:04
26-05-2022 / 16:38:05
26-05-2022 / 16:38:05
26-05-2022 / 16:38:06
VTL-1 not connected, skipping telemetry
26-05-2022 / 16:38:07
26-05-2022 / 16:38:08
26-05-2022 / 16:38:09
26-05-2022 / 16:38:10
26-05-2022 / 16:38:11
26-05-2022 / 16:38:12
26-05-2022 / 16:38:13
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
26-05-2022 / 16:38:14
Disconnected from IoTHub!: Client: VTL-1, error code = 0x0002000c
Restoring Pre-emption for monitoring!: Thread Name: Demo
reconnecting iothub, after backoff

I have confirmed the following:

  1. that if I disable the telemetry thread and only connect the 2 clients (device and module) I do not see any disconnects
  2. Leave telemetry enabled and only enable the device client I do not see any disconnects

This testing points to a lack of memory allocation for some resource but I'm not sure which one to adjust. I reviewed the stack usage and didn't see anything out of the ordinary.

I have redacted the name of my iot hub but that can be provided by some other secure means for verification

Thanks 2 clients - frequent disconnects.txt IoT Client source.zip

dsilva-vd commented 2 years ago

I have to mention to comment out the following 2 lines in sample_config.h to match the code to my output log

#define DISABLE_MODULE_CLIENTS
#define DISABLE_MODULE1
bo-ms commented 2 years ago

Hi @dsilva-vd I used your sample and I was able to reproduce this issue, after debugging, I found the issue that you used the same metadata buffer for both connections, then I changed the code (see attachment) using different metadata buffer for different connection, the sample works. Could you try it on your side? sample_azure_iot_embedded_sdk_with_retry.zip

dsilva-vd commented 2 years ago

@bo-ms THank you for the help. That did solve it and I found a copy paste error in my telemetry client when publishing data for the module client also

Just to confirm you created a second iot stack variable but you never used it in the code. Just confirming I didn't miss anything, correct?

bo-ms commented 2 years ago

@dsilva-vd Good to know it works. I created a second stack and used it as below for a quick test. image

bo-ms commented 2 years ago

Closing, @dsilva-vd feel free to reopen if you have any further issue.