Azure / azure-iot-middleware-freertos

Azure IoT Middleware for FreeRTOS
https://azure.github.io/azure-iot-middleware-freertos/
MIT License
79 stars 25 forks source link

No way to get MQTT error code after calling AzureIoTHubClient_Connect() #320

Open rtheil-growlink opened 11 months ago

rtheil-growlink commented 11 months ago

Is there an existing issue for this?

Version

1.4.0

Description of the issue

A need has become apparent that we need to know the MQTT error status when attempting to establish the MQTT connection, and it fails. We need to be able to control reconnection behavior based on the error code returned for the MQTT connection.

in AzureIoTHubClient_Connect(), there is AzureIoTMQTTResult_t xMQTTResult. There doesn't seem to be a way to have that value returned, or even stored in pxAzureIoTHubClient._internal.MQTTContext

After a network outage yesterday to our IoT Hub, we found that not only our ESP devices, but also our IoT Edge devices, all came back online at exactly at the same time and caused a flood of traffic writing reported properties, which caused throttling, resulting in thousands of devices being unable to come back online for quite some time. It would be nice to control a backoff algorithm based on that specific error (eAzureIoTMQTTNoDataAvailable), so that we can ensure some level of randomness when this type of event happens in the future. We've also seen the need to handle other MQTT errors differently, such as eAzureIoTMQTTServerRefused, which can occur when there is a >5 minute time difference between client and server. eAzureIoTMQTTNoMemory could be a sign of a need to reboot or alert of a memory leak.

Hopefully this data is available somewhere and I'm just not seeing it, otherwise I believe this would be a good feature to add to this library. I would argue that this would also be useful when calling _ProcessLoop(), as some MQTT errors warrant an immediate retry, especially when connectivity is poor. We have a number of customers with incredibly poor internet connections, including point-to-point, satellite, and rural cellular.

As always, thanks for everything you do on this project!

Expected behavior

No response

Steps to reproduce the issue

No response

Relevant log output

No response

Code of Conduct

ericwolz commented 9 months ago

Thanks for your feedback here. We have these items in our backlog. Currently we don't do connection management, but is a highly requested feature that we will try to address in a future release.

Current expectation is that the application implement a backoff algorithm. https://github.com/Azure-Samples/iot-middleware-freertos-samples/blob/794c4936e6573baa636f44f78d8e3878cd49fdfa/demos/sample_azure_iot/sample_azure_iot.c#L615