Azure / azure-iot-sdk-java

A Java SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-java/
Other
198 stars 236 forks source link

[Bug Report] DeviceClient.close() halts when attempting to open connection using retry policy #1759

Closed gabrielandrade2 closed 10 months ago

gabrielandrade2 commented 10 months ago

Context

Description of the issue

I was reworking my device's code to remove an older reconnection method that we used in favor of the SDK's internal reconnection mechanism. To test everything out, I set up an environment where I can block the device's connection to the internet in order to test the reconnection mechanism.

I am currently using the default ExponentialBackoffWithJitter retry policy (however, I've already tested some variations of the parameters) and opening my connection using DeviceClient.open(true);.

(As context information, my device was a UI where the user have access to a toggle switch to enable or disable the IoT connection as he pleases)

While executing a test where I start the code already without internet access and attempt to open the connection, I noticed that the code is blocked in the open() call (probably because of the while(true) loop in IotHubTransport class). That is fine as I use a separate thread to open the connection.

However, I realized that if the user toggles the switch to close the connection while the connection hasn't been established, DeviceClient.close() also locks until either the retry policy expires or the OperationTimeout is reached. This appears to happen because open() and close() share the same stateLock in DeviceIO class.

This is a problem for me because, not only when the client attempts to close the connection it will be stuck, but it also delays the shutdown/restart of the IoT service within my device, as it attempts to close immediately all open connections/threads before finalizing itself.

I am not sure if this is an intended behavior on the SDK's part or, in fact, an actual bug. Still, I find weird not having an option to forcibly stop the connection attempt procedure and finalize all operations.

I would like to either have such an option for the SDK side or to know the proper way to handle this situation, if there is one. Currently, I was able to get around this issue by setting client.setOperationTimeout(TimeUnit.SECONDS.toMillis(1)); and then canceling the running task from my scheduler.

Thank you for your support.

Code sample exhibiting the issue

I've written a sample code that demonstrates the situation. I've used an operation timeout of 60 seconds and a delay before attempting to close the connections of 10 seconds, for sake of speed, but those values can be changed, and the behavior should be the same. In fact, I work with larger values in my actual device.

The code should be run without internet access to reproduce the same results.

import com.microsoft.azure.sdk.iot.device.DeviceClient;
import com.microsoft.azure.sdk.iot.device.IotHubClientProtocol;

import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class Test {

    public static void main(String[] args) throws Exception {
        String connString = "MY_CONNECTION_STRING";
        ScheduledExecutorService executor = new ScheduledThreadPoolExecutor(4);

        DeviceClient client = new DeviceClient(connString, IotHubClientProtocol.MQTT_WS);

        System.out.println("Successfully created an IoT Hub client.");

        client.setOperationTimeout(TimeUnit.SECONDS.toMillis(60));
        executor.execute(() ->
        {
            try
            {
                System.out.println("Attempting to open connection to IoT Hub.");
                client.open(true);
                System.out.println("Opened connection to IoT Hub.");
            }
            catch (Exception ignored)
            {
            }
        });

        Thread.sleep(TimeUnit.SECONDS.toMillis(10));
        System.out.println("Attempting to close connection to IoT Hub.");
        client.close();
        System.out.println("Closed connection to IoT Hub.");

        System.out.println("Finish");
        executor.shutdown();
    }
}

Console log of the issue

In the attached console output, it can be seen that the Device.close() call starts at 01:41:56,231, but effectively executes at 01:42:46,468, exactly 1 minute after the connection attempt and when the operation timeout is reached.

2023-10-28 01:41:46,084 INFO (main) [com.microsoft.azure.sdk.iot.device.transport.ExponentialBackoffWithJitter] - NOTE: A new instance of ExponentialBackoffWithJitter has been created with the following properties. Retry Count: 2147483647, Min Backoff Interval: 100, Max Backoff Interval: 10000, Max Time Between Retries: 100, Fast Retry Enabled: true
2023-10-28 01:41:46,222 DEBUG (main) [com.microsoft.azure.sdk.iot.device.ClientConfiguration] - Device configured to use software based SAS authentication provider
2023-10-28 01:41:46,229 DEBUG (main) [com.microsoft.azure.sdk.iot.device.DeviceClient] - Initialized a DeviceClient instance using SDK version 2.3.1
Successfully created an IoT Hub client.
Attempting to open connection to IoT Hub.
2023-10-28 01:41:46,271 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttIotHubConnection] - Opening MQTT connection...
2023-10-28 01:41:46,331 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt] - Sending MQTT CONNECT packet...
Attempting to close connection to IoT Hub.
2023-10-28 01:41:56,231 INFO (main) [com.microsoft.azure.sdk.iot.device.DeviceClient] - Closing device client...
2023-10-28 01:42:16,416 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt] - Exception encountered while sending MQTT CONNECT packet
org.eclipse.paho.client.mqttv3.MqttException: MqttException
    at org.eclipse.paho.client.mqttv3.internal.ExceptionHelper.createMqttException(ExceptionHelper.java:38) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:738) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.net.SocketTimeoutException: Connect timed out
    at sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[?:?]
    at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[?:?]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
    at java.net.Socket.connect(Socket.java:633) ~[?:?]
    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:299) ~[?:?]
    at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:74) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.SSLNetworkModule.start(SSLNetworkModule.java:132) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.websocket.WebSocketSecureNetworkModule.start(WebSocketSecureNetworkModule.java:63) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:724) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    ... 1 more
2023-10-28 01:42:16,427 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.IotHubTransport] - Encountered an exception while opening the client. Checking the configured retry policy to see if another attempt should be made.
com.microsoft.azure.sdk.iot.device.transport.ProtocolException: Unable to establish MQTT connection
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.exceptions.PahoExceptionTranslator.convertToMqttException(PahoExceptionTranslator.java:36) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt.connect(Mqtt.java:132) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttMessaging.start(MqttMessaging.java:62) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttIotHubConnection.open(MqttIotHubConnection.java:286) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.openConnection(IotHubTransport.java:1370) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.open(IotHubTransport.java:487) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.DeviceIO.open(DeviceIO.java:106) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.InternalClient.open(InternalClient.java:163) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.DeviceClient.open(DeviceClient.java:201) ~[classes/:?]
    at samples.com.microsoft.azure.sdk.iot.SendEvent.lambda$main$0(SendEvent.java:92) ~[classes/:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.eclipse.paho.client.mqttv3.MqttException: MqttException
    at org.eclipse.paho.client.mqttv3.internal.ExceptionHelper.createMqttException(ExceptionHelper.java:38) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:738) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    ... 1 more
Caused by: java.net.SocketTimeoutException: Connect timed out
    at sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[?:?]
    at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[?:?]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
    at java.net.Socket.connect(Socket.java:633) ~[?:?]
    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:299) ~[?:?]
    at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:74) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.SSLNetworkModule.start(SSLNetworkModule.java:132) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.websocket.WebSocketSecureNetworkModule.start(WebSocketSecureNetworkModule.java:63) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:724) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    ... 1 more
2023-10-28 01:42:16,429 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.RetryDecision] - NOTE: A new instance of RetryDecision has been created with retry enabled, the client will retry after 0 milliseconds
2023-10-28 01:42:16,429 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttIotHubConnection] - Opening MQTT connection...
2023-10-28 01:42:16,430 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt] - Sending MQTT CONNECT packet...

2023-10-28 01:42:46,463 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt] - Exception encountered while sending MQTT CONNECT packet
org.eclipse.paho.client.mqttv3.MqttException: MqttException
    at org.eclipse.paho.client.mqttv3.internal.ExceptionHelper.createMqttException(ExceptionHelper.java:38) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:738) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.net.SocketTimeoutException: Connect timed out
    at sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[?:?]
    at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[?:?]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
    at java.net.Socket.connect(Socket.java:633) ~[?:?]
    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:299) ~[?:?]
    at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:74) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.SSLNetworkModule.start(SSLNetworkModule.java:132) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.websocket.WebSocketSecureNetworkModule.start(WebSocketSecureNetworkModule.java:63) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:724) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    ... 1 more
2023-10-28 01:42:46,463 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.IotHubTransport] - Encountered an exception while opening the client. Checking the configured retry policy to see if another attempt should be made.
com.microsoft.azure.sdk.iot.device.transport.ProtocolException: Unable to establish MQTT connection
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.exceptions.PahoExceptionTranslator.convertToMqttException(PahoExceptionTranslator.java:36) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt.connect(Mqtt.java:132) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttMessaging.start(MqttMessaging.java:62) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttIotHubConnection.open(MqttIotHubConnection.java:286) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.openConnection(IotHubTransport.java:1370) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.open(IotHubTransport.java:487) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.DeviceIO.open(DeviceIO.java:106) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.InternalClient.open(InternalClient.java:163) ~[classes/:?]
    at com.microsoft.azure.sdk.iot.device.DeviceClient.open(DeviceClient.java:201) ~[classes/:?]
    at samples.com.microsoft.azure.sdk.iot.SendEvent.lambda$main$0(SendEvent.java:92) ~[classes/:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.eclipse.paho.client.mqttv3.MqttException: MqttException
    at org.eclipse.paho.client.mqttv3.internal.ExceptionHelper.createMqttException(ExceptionHelper.java:38) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:738) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    ... 1 more
Caused by: java.net.SocketTimeoutException: Connect timed out
    at sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[?:?]
    at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[?:?]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
    at java.net.Socket.connect(Socket.java:633) ~[?:?]
    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:299) ~[?:?]
    at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:74) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.SSLNetworkModule.start(SSLNetworkModule.java:132) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.websocket.WebSocketSecureNetworkModule.start(WebSocketSecureNetworkModule.java:63) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:724) ~[org.eclipse.paho.client.mqttv3-1.2.5.jar:?]
    ... 1 more
2023-10-28 01:42:46,464 DEBUG (pool-1-thread-1) [com.microsoft.azure.sdk.iot.device.transport.RetryDecision] - NOTE: A new instance of RetryDecision has been created with retry enabled, the client will retry after 195 milliseconds
2023-10-28 01:42:46,468 INFO (main) [com.microsoft.azure.sdk.iot.device.DeviceClient] - Device client closed successfully
Closed connection to IoT Hub.
Finish

Process finished with exit code 0
timtay-microsoft commented 10 months ago

Good catch. #1760 should fix this for you

gabrielandrade2 commented 10 months ago

@timtay-microsoft , Thank you very much!

timtay-microsoft commented 10 months ago

Version 2.4.0 fixes this issue, so I'll close this thread