Azure / iotedge

The IoT Edge OSS project
MIT License
1.47k stars 462 forks source link

Module created with Python SDK fails to renew SAS Token #6715

Closed lucarv closed 2 years ago

lucarv commented 2 years ago

Expected Behavior

Edge Module written in python should be able to renew its SAS Token. This was previously closed as https://github.com/Azure/azure-iot-sdk-python/issues/993. There is plenty of information there, including logs.

Current Behavior

Module fails to authenticate.

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Create a module in VSC using the Python SDK 2.Deploy, connect 3.Wait a couple of hours
  2. Watch it as it crashes

Context (Environment)

IoT Edge 1.4 running on ubuntu 20.04, but it should be the same whatever the environment (above 1.1)

Output of iotedge check

Everything beautifully green

Click here ``` DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting renew_token in pipeline thread INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenStage: Renewing SAS Token... DEBUG:urllib3.connectionpool:http://localhost:None "POST /modules/sample/genid/638010619771297413/sign?api-version=2022-08-03 HTTP/1.1" 200 57 INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenStage: Starting reauthorization process for new SAS token DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionStateStage(ReauthorizeConnectionOperation): State changes CONNECTED -> REAUTHORIZING. Sending op down. DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ReauthorizeConnectionOperation): reauthorizing. Will issue disconnect and then a connect DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(DisconnectOperation): disconnecting INFO:azure.iot.device.common.mqtt_transport:disconnecting MQTT client DEBUG:paho:Sending DISCONNECT INFO:azure.iot.device.common.mqtt_transport:disconnected with result code: 0 DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_mqtt_disconnected in pipeline thread DEBUG:azure.iot.device.common.mqtt_transport:_mqtt_client.disconnect returned rc=0 DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token update alarm DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenStage: Scheduling automatic SAS Token renewal at epoch time: 1665475620 INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: _on_mqtt_disconnect called DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:ConnectionStateStage(DisconnectedEvent): Not attempting to reconnect (Reauthorization in progress) DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:PipelineRootStage: DisconnectedEvent received. Calling on_disconnected_handler DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting _on_disconnected in callback thread DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage: Expected disconnect - completing pending disconnect op DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:DisconnectOperation: completing without error DEBUG:azure.iot.device.common.pipeline.pipeline_ops_base:ReauthorizeConnectionOperation: creating worker op of type ConnectOperation DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): connecting DEBUG:azure.iot.device.common.pipeline.pipeline_stages_mqtt:MQTTTransportStage(ConnectOperation): Starting watchdog INFO:azure.iot.device.iothub.abstract_clients:Connection State - Disconnected INFO:azure.iot.device.iothub.abstract_clients:Cleared all pending method requests due to disconnect DEBUG:azure.iot.device.common.mqtt_transport:connecting to mqtt broker INFO:azure.iot.device.common.mqtt_transport:Connect using port 8883 (TCP) DEBUG:azure.iot.device.common.pipeline.pipeline_thread:Starting watchdog_function in pipeline thread INFO:azure.iot.device.common.mqtt_transport:Forcing paho disconnect to prevent it from automatically reconnecting DEBUG:azure.iot.device.common.mqtt_transport:Done forcing paho disconnect INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:transport.connect raised error INFO:azure.iot.device.common.pipeline.pipeline_stages_mqtt:Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 397, in connect host=self._hostname, port=8883, keepalive=self._keep_alive File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect return self.reconnect() File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1073, in reconnect sock.do_handshake() File "/usr/local/lib/python3.7/ssl.py", line 1139, in do_handshake self._sslobj.do_handshake() socket.timeout: _ssl.c:1074: The handshake operation timed out The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 206, in _run_op self.transport.connect(password=password) File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 419, in connect raise exceptions.ConnectionFailedError() from e azure.iot.device.common.transport_exceptions.ConnectionFailedError ```

Device Information

Runtime Versions

Logs

aziot-edged logs ``` ```
edge-agent logs ``` ```
edge-hub logs ``` <7> 2022-10-11 07:08:43.554 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.DeviceConnectivityManager] - IotHub call succeeded <7> 2022-10-11 07:08:43.554 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient] - Operation SendEventAsync succeeded for 6f4d3d12106d43b7a6fa33e98c6df045/sample <7> 2022-10-11 07:08:43.554 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Sending message for 6f4d3d12106d43b7a6fa33e98c6df045/sample <6> 2022-10-11 07:09:00.732 +00:00 [INF] [EdgeHub] - "Closing connection for device: 6f4d3d12106d43b7a6fa33e98c6df045/sample, , " <6> 2022-10-11 07:09:00.733 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MessagingServiceClient] - Disposing MessagingServiceClient for device Id 6f4d3d12106d43b7a6fa33e98c6df045/sample because of exception - <6> 2022-10-11 07:09:00.733 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.DeviceProxy] - Setting device proxy inactive for device Id 6f4d3d12106d43b7a6fa33e98c6df045/sample <6> 2022-10-11 07:09:00.733 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - Removing device connection for device 6f4d3d12106d43b7a6fa33e98c6df045/sample with removeCloudConnection flag 'True'. <7> 2022-10-11 07:09:00.733 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 => [::ffff:40.113.176.164]:8883] WRITE: DisconnectPacket[Type=DISCONNECT, QualityOfService=AtMostOnce, Duplicate=False, Retain=False]" <7> 2022-10-11 07:09:00.733 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 => [::ffff:40.113.176.164]:8883] FLUSH" <7> 2022-10-11 07:09:00.733 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient] - Received connection status changed callback with connection status Disabled and reason Client_Close for 6f4d3d12106d43b7a6fa33e98c6df045/sample <7> 2022-10-11 07:09:00.738 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 => [::ffff:40.113.176.164]:8883] RECEIVED_COMPLETE" <7> 2022-10-11 07:09:00.738 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 => [::ffff:40.113.176.164]:8883] READ" <7> 2022-10-11 07:09:00.738 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 => [::ffff:40.113.176.164]:8883] READ" <7> 2022-10-11 07:09:00.738 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 :> [::ffff:40.113.176.164]:8883] INACTIVE" <7> 2022-10-11 07:09:00.738 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 :> [::ffff:40.113.176.164]:8883] CLOSE" <7> 2022-10-11 07:09:00.738 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 :> [::ffff:40.113.176.164]:8883] UNREGISTERED" <7> 2022-10-11 07:09:00.738 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x74841ffb, [::ffff:172.18.0.4]:59684 :> [::ffff:40.113.176.164]:8883] HANDLER_REMOVED" <7> 2022-10-11 07:09:29.438 +00:00 [DBG] [DotNetty.Handlers.Logging.LoggingHandler] - "[id: 0x3d209f7e, [::ffff:172.18.0.4]:43198 => [::ffff:40.113.176.164]:8883] FLUSH" ```

Additional Information

Please provide any additional information that may be helpful in understanding the issue.

lucarv commented 2 years ago

This has been verified as NOT being a python SDK. It fails also for node.js using MQTT libraries. AMQPS theoretically will not have the problem since SAS renewal does not require session tear down.

nlcamp commented 2 years ago

@lucarv - Can you share the source code snippet for how you're connecting your python module to Edge Hub? Presumably you're not using IoTHubModuleClient.create_from_edge_env() because that seems to use AMQP. Are you using IoTHubModuleCLient.create_from_connection_string() or IoTHubDeviceClient.create_from_connection_string()?

nlcamp commented 2 years ago

@lucarv - I synced with @cartertinney and he informed me that create_from_edge_env() actually does use MQTT.

In an attempt to expedite a repro, I've setup my python module to renew the SAS token every 5 minutes using the following line: client = IoTHubModuleClient.create_from_edge_environment(sastoken_ttl=300)

With this setup, I'm not seeing any issues. Edge Hub logs show the Python module reauthenticating every 5 minutes:

$ sudo iotedge logs edgeHub | grep -i reauth
<6> 2022-10-13 18:01:04.990 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Starting timer to authenticate connections with a period of 300 seconds
<7> 2022-10-13 18:01:05.331 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - EdgeHub cloud connection established, refreshing device scope cache.
<6> 2022-10-13 18:01:05.340 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Service identity for temp-device/$edgeHub in device scope was updated.
<6> 2022-10-13 18:06:05.003 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Entering periodic task to reauthenticate connected clients
<7> 2022-10-13 18:06:05.011 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeAuthenticator] - Client temp-device/PythonModule in device scope reauthenticated locally.
<7> 2022-10-13 18:06:05.011 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Reauthenticated client temp-device/PythonModule successfully
<7> 2022-10-13 18:06:05.012 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeAuthenticator] - Client temp-device/SimulatedTemperatureSensor in device scope reauthenticated locally.
<7> 2022-10-13 18:06:05.012 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Reauthenticated client temp-device/SimulatedTemperatureSensor successfully
<6> 2022-10-13 18:11:04.994 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Entering periodic task to reauthenticate connected clients
<7> 2022-10-13 18:11:04.994 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeAuthenticator] - Client temp-device/PythonModule in device scope reauthenticated locally.
<7> 2022-10-13 18:11:04.994 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Reauthenticated client temp-device/PythonModule successfully

Similarly, the python SDK shows successful SAS token renewal every 5 minutes:

$ sudo iotedge logs -f PythonModule | grep -i sas
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684246
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token...
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684426
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token...
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684606
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token...
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684786
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete

I'll remove sastoken_ttl=500 and let it use the default to see if it changes anything.

lucarv commented 2 years ago

We are using whatever comes out of VSC scaffolding. We also noticed that you need to have considerable load on edgeHub, if your device is idling about nothing bad happens , the mqqt connect will never time out.

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: Noel Campbell @.> Sent: Thursday, October 13, 2022 8:47:46 PM To: Azure/iotedge @.> Cc: Tayo Carvalhal @.>; Mention @.> Subject: Re: [Azure/iotedge] Module created with Python SDK fails to renew SAS Token (Issue #6715)

@lucarvhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flucarv&data=05%7C01%7CTayo.Carvalhal%40microsoft.com%7Cffed706a76b84c66d39308daad4b6beb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638012836711176011%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ncRDZMRAN8QNDmISyNWGv%2Fi6WH3c9H2d547EW0YFdko%3D&reserved=0 - I synced with @cartertinneyhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcartertinney&data=05%7C01%7CTayo.Carvalhal%40microsoft.com%7Cffed706a76b84c66d39308daad4b6beb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638012836711176011%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uSRm3HjlljO5FxLYisotysOrbKWrn6XTZT2EzT2d3r0%3D&reserved=0 and he informed me that create_from_edge_env() actually does use MQTT.

In an attempt to expedite a repro, I've setup my python module to renew the SAS token every 5 minutes using the following line: client = IoTHubModuleClient.create_from_edge_environment(sastoken_ttl=300)

With this setup, I'm not seeing any issues. Edge Hub logs show the Python module reauthenticating every 5 minutes:

$ sudo iotedge logs edgeHub | grep -i reauth

<6> 2022-10-13 18:01:04.990 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Starting timer to authenticate connections with a period of 300 seconds <7> 2022-10-13 18:01:05.331 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - EdgeHub cloud connection established, refreshing device scope cache. <6> 2022-10-13 18:01:05.340 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Service identity for temp-device/$edgeHub in device scope was updated. <6> 2022-10-13 18:06:05.003 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Entering periodic task to reauthenticate connected clients <7> 2022-10-13 18:06:05.011 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeAuthenticator] - Client temp-device/PythonModule in device scope reauthenticated locally. <7> 2022-10-13 18:06:05.011 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Reauthenticated client temp-device/PythonModule successfully <7> 2022-10-13 18:06:05.012 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeAuthenticator] - Client temp-device/SimulatedTemperatureSensor in device scope reauthenticated locally. <7> 2022-10-13 18:06:05.012 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Reauthenticated client temp-device/SimulatedTemperatureSensor successfully <6> 2022-10-13 18:11:04.994 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Entering periodic task to reauthenticate connected clients <7> 2022-10-13 18:11:04.994 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeAuthenticator] - Client temp-device/PythonModule in device scope reauthenticated locally. <7> 2022-10-13 18:11:04.994 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionReauthenticator] - Reauthenticated client temp-device/PythonModule successfully Similarly, the python SDK shows successful SAS token renewal every 5 minutes: $ sudo iotedge logs -f PythonModule | grep -i sas DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684246 INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token... DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684426 INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token... DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684606 INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token... DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665684786 INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete I'll remove sastoken_ttl=500 and let it use the default to see if it changes anything. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>
nlcamp commented 2 years ago

I'll remove sastoken_ttl=500 and let it use the default to see if it changes anything.

I'm still not seeing any errors. As seen below, the Python SDK is successfully renewing the SAS token ever 58 minutes:

DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665734130
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token...
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665737610
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token...
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665741090
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:Renewing SAS Token...
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Cancelling SAS Token renewal alarm
DEBUG:azure.iot.device.common.pipeline.pipeline_stages_base:Scheduling SAS Token renewal at epoch time: 1665744570
INFO:azure.iot.device.common.pipeline.pipeline_stages_base:SasTokenRenewalStage(ReauthorizeConnectionOperation): reauthorize connection operation is complete

We also noticed that you need to have considerable load on edgeHub, if your device is idling about nothing bad happens , the mqqt connect will never time out.

@lucarv - Ok, I'll run some more tests with EH processing more messages to see if I can reproduce the error.

lucarv commented 2 years ago

@nlcamp this is verified as not being a python SDK issue, as it has been verified also on a node.js module using the mqtt library. We have done so more testing and verified that setting the upstream protocol between edge and iot hub makes the problem disappear

nlcamp commented 2 years ago

@lucarv - I saw in the logs in the SDK ticket you linked that when it fails for you, the SDK throws a ConnectionFailedError. I'm not suggesting that it's a bug in the SDK, but rather that looking at the SDK logs is a good way to see if the failure is occurring for me. Unfortunately, I'm still not seeing any failures in my repro attempts.

Also, in my setup I'm not setting the UPSTREAM_PROTOCOL variable on Edge Hub, but it still runs without the errors you're seeing.

I'll close this GH issue and continue working on a repro against the icm that was created for this issue. We'll update this issue with any findings once a root cause is determined in case other users run into the same problems.

lucarv commented 2 years ago

@nlcamp - You MUST set up upstream to MQTT for it to fail. If you let it be as default (AMQP) you will never have the error because using AMQP you will not tear down the existing connection. It is a bug in the platform and it should be taken care of. Please reopen.