Azure / azure-iot-sdk-python

A Python SDK for connecting devices to Microsoft Azure IoT services
MIT License
423 stars 377 forks source link

Client stays stuck after IoT Hub failover using private link #1197

Closed gabrielSoudry closed 6 days ago

gabrielSoudry commented 1 week ago

Context

Description of the issue

We are using Azure IoT Hub with a failover setup between two regions, France Central and France South, both configured with Private Links. When we initiate a failover from France Central to France South, the IoT Hub successfully fails over, and DNS resolves to the new IP as expected. However, the Azure SDK client does not reconnect automatically after the failover, even though the DNS resolves correctly.

We would expect the SDK to handle the reconnection automatically when the DNS updates, but this does not happen.

Restart the python service work, but this defeats the purpose of the failover resilience.

Steps to Reproduce:

Expected Behavior: The Azure SDK client should automatically reconnect to the IoT Hub after failover, when DNS has updated to the new region's IP address.

Actual Behavior: The Azure SDK client fails to reconnect to the IoT Hub after the failover, even though DNS resolves to the correct IP address. The Azure SDK client does not reconnect automatically after the failover, and instead of retrying to reconnect, it remains stuck.


14 11:43:03 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.aio.async_clients: Sending message to Hub...
Oct 14 11:43:03 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: publishing on devices/unipi-m203-741/messages/events/
Oct 14 11:43:04 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: payload published for 6
Oct 14 11:43:04 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.aio.async_clients: Successfully sent message to Hub
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 7

==============FAILOVER==================
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Forcing paho disconnect to prevent it from automatically reconnecting
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called: The connection was lost.
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Oct 14 11:52:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Connect using port 8883 (TCP)
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage(ConnectOperation): Connection watchdog expired.  Cancelling op
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnecting MQTT client
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 0
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.pipeline.pipeline_exceptions.OperationTimeout: Transport timeout on connection operation\n']
Oct 14 1
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.pipeline.pipeline_stages_base: ConnectionStateStage: DisconnectEvent received while in unexpected state - ConnectionState.DISCONNECTED
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:53:18 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Oct 14 11:53:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Connect using port 8883 (TCP)
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage(ConnectOperation): Connection watchdog expired.  Cancelling op
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnecting MQTT client
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 0
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.pipeline.pipeline_exceptions.OperationTimeout: Transport timeout on connection operation\n']
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.pipeline.pipeline_stages_base: ConnectionStateStage: DisconnectEvent received while in unexpected state - ConnectionState.DISCONNECTED
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:28 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Connect using port 8883 (TCP)
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: connected with result code: 5
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: disconnected with result code: 5
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_connection_failure called: Connection Refused: not authorised.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.mqtt_transport: Forcing paho disconnect to prevent it from automatically reconnecting
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: _on_mqtt_disconnect called: The connection was refused.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.UnauthorizedError: Connection Refused: not authorised.\n']
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.pipeline.pipeline_stages_base: ConnectionStateStage: DisconnectEvent received while in unexpected state - ConnectionState.DISCONNECTED
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.common.pipeline.pipeline_stages_mqtt: MQTTTransportStage: Unexpected disconnect (no pending connection op)
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Connection State - Disconnected
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: INFO azure.iot.device.iothub.abstract_clients: Cleared all pending method requests due to disconnect
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: Exception caught in background thread.  Unable to handle.
Oct 14 11:54:38 unipi-m203-732 gotomate-orchestrator[891]: WARNING azure.iot.device.common.handle_exceptions: ['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']

=> STUCK
olivakar commented 6 days ago

The creation of the device client is dependent on the connection string which belongs to a specific hub. Unless the device client is created again with a different connection string reconnecting to a different hub on the fly not possible.

As this would be a significant change/addition to functionality we are not delivering new features at this time and are focusing on security and stability.

Since this is a very specific failover scenario, one of the approaches could be create the device client again using a different connection string in application-level code.

gabrielSoudry commented 5 days ago

Thanks for your response, while I understand that the creation of the device client is tied to a specific connection string, in the case of a failover IoT Hub in another region, the connection using the X.509 certificate remains valid. Since it is the same IoT Hub (just a failover instance in a different region), the certificate is still applicable.

Image

gabrielSoudry commented 22 hours ago

@olivakar any news ? can you reopen the issue please