Azure / azure-iot-sdk-csharp

A C# SDK for connecting devices to Microsoft Azure IoT services
Other
464 stars 493 forks source link

Getting Connection Closed on New Connection Error when it was previously not generated #3454

Closed ctruo closed 3 months ago

ctruo commented 5 months ago

Context

Description of the issue

We have an application that streams data to IoT Hub on a scheduled frequency. We have an update data functionality that allows us to send our backfilled data during a specified time frame. Currently with Microsoft.Azure.Devices.Client 1.42.0, our update data functionality is suddenly breaking every time we call it and we are seeing the 2 following exceptions generated:

_Type: IotHubException,
_Message: error(condition:com.microsoft:connection-closed-on-new-connection,description:,info:[]),
_Source: 'Microsoft.Azure.Devices.Client',

_Type: AmqpException,
_Message: ,
_Source: '',
The way our current update data functionality works is it creates a new connection while the scheduled frequency stream's connection is still existing. This may be the root of our issues, but we are surprised because everything was working fine when we last tested on 3/13/2024 and no changes were made since. Strangely, this behavior is not consistent across our IoT Hubs. Below is a table of the behavior for each of our IoT Hubs. Tier Creation Date Behavior
Standard 11/21/2023 Fails
Standard 01/03/2024 Fails
Standard 03/06/2024 Fails
Standard 4/11/2024 Works
Standard 4/11/2024 Works
Basic 10/25/2023 Works

The Standard tier IoT Hubs created on 4/11/2024 were made to test when we found out the issue in our application. What is puzzling to us is that these 2 new IoT Hubs and our older Basic IoT Hub work fine, but our existing Standard tier IoT Hubs are running into this issue.

For now we have reverted Microsoft.Azure.Devices.Client to 1.18.1 which does not cause this issue, but there are side effect errors I discuss below.

Something we're also noticing in both 1.42.0 and 1.18.1 is when we connect using a connection string with a device in it (HostName=xxxxx.azure-devices.net;DeviceId=xxxxx;SharedAccessKey=xxxxx), our application will eventually generate exceptions after an inconsistent time and stop publishing data to our "older" Standard tier targets. For some reason we are using MQTT WebSocket Only for this specific connection case. I'm thinking it is because we have multiple device connections at the same and I believe MQTT is only supposed to have one. More clarification is appreciated.

The exceptions generated differ between these two versions: v1.42.0:

_Type: IotHubCommunicationException,
_Message: Transient network error occurred, please retry.,
_Source: 'Microsoft.Azure.Devices.Client',

_Type: SocketException,
_Message: An existing connection was forcibly closed by the remote host,
_Source: 'mscorlib',

v1.18.1

_Type: TimeoutException,
_Message: Operation timeout expired.,
_Source: 'Microsoft.Azure.Devices.Client',

Any information and help would be greatly appreciated. Thanks!

Console log of the issue

Logs were collected when invoking the update data functionality that caused the exception to throw. Comparison of logs show no exceptions in v1.18.1. Logs for v1.42.0 (exceptions are near the end): v1.42.0.txt Logs for v1.18.1: v1.18.1.txt

andyk-ms commented 5 months ago

Confirming the version you used: 1.18.1 -> https://github.com/Azure/azure-iot-sdk-csharp/releases/tag/2018-10-9 1.42.0 -> https://github.com/Azure/azure-iot-sdk-csharp/releases/tag/2023-05-01 these are the versions in questions, correct?

In the log for 1.42.0, the error listed is AMQP but you seems to discuss a MQTT error. For the AMQP in the log, it indicates a retry-able error, do you have retry logic to handle this? Please elaborate on the error handling. 2024-04-16T02:03:39.7255428 [Microsoft-Azure-Devices-Device-Client-Enter] (RetryStrategyAdapter#31617841, RetryStrategyAdapter.ShouldRetry, (0, Microsoft.Azure.Devices.Client.Exceptions.IotHubCommunicationException: Exception of type 'Microsoft.Azure.Devices.Client.Exceptions.IotHubCommunicationException' was thrown.)). _Type: AmqpException, _Message: , _Source: '', _HResult: [ -2146233088 ]

For the package, would you be able to update to the latest 1.42.3 with fixes to re-test? https://github.com/Azure/azure-iot-sdk-csharp/releases/tag/2024-03-28

ctruo commented 5 months ago

Hi Andy. Thanks for your reply. Yes those are the versions.

We currently do not have retry logic to handle this. We only have a simple try-catch block. I can try to upgrade to 1.42.3 and test, but we do have some constraints when it comes to upgrades (sorry I cannot elaborate further). Ideally we would like to not have an upgrade, but if it is necessary we will see.

MQTT is only used when we connect using a connection string with DeviceId in it (HostName=xxxxx.azure.devices.net;DeviceId=xxxxx;SharedAccessKey=xxxxx). If we use a shared access level connection string with no DeviceId, AMQP is used so the logs you are seeing is that. We've changed the MQTT transport to AMQP for the time being.