Azure / azure-iot-sdk-java

A Java SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-java/
Other
200 stars 237 forks source link

[Bug Report] IotHubConnectionStatusChangeCallback is not called if IotHubTransportConnection close throw an exception #1337

Closed DBY047 closed 3 years ago

DBY047 commented 3 years ago

Azure IoT hub sdk is used to connect large amount of embedded device to the cloud. The environment is not always good and we frequently loose internet connectivity. While our product tries to recover the hub connection, we experience issue closing the connection and handling changes in connectivity with IotHubConnectionStatusChangeCallback.

Description of the issue

Calling closeNow() on DeviceClient sometimes throw an ProtocolException: Unable to disconnect Caused by: Timed out waiting for a response from the server. When that happen, IotHubConnectionStatusChangeCallback is not called. See code below.

The root cause comes from the close method in IotHubTransport. if this.iotHubTransportConnection.close() throwing an exception. All lines after the close are not executed. DeviceIO catch the exception and set is state is to disconnected. Any logic that should have been handled in the callBack to handle the disconnection event is not executed.

The result is deviceClient is disconnected but our application is not notified by the sdk that the connection status changed. I guess this might leak the send end receive thread because notifyAll is also not called.

Of course, we could handled this issue in our code but the way the sdk handled this make IotHubConnectionStatusChangeCallback unreliable.

Code sample exhibiting the issue

//Codes_SRS_IOTHUBTRANSPORT_34_024: [This function shall close the connection.]
if (this.iotHubTransportConnection != null)
{
    this.iotHubTransportConnection.close();
}

//Codes_SRS_IOTHUBTRANSPORT_34_025: [This function shall invoke updateStatus with status DISCONNECTED and the
// supplied reason and cause.]
this.updateStatus(IotHubConnectionStatus.DISCONNECTED, reason, cause);

// Notify send thread to finish up so it doesn't survive this close
synchronized (this.sendThreadLock)
{
    this.sendThreadLock.notifyAll();
}

// Notify receive thread to finish up so it doesn't survive this close
synchronized (this.receiveThreadLock)
{
    this.receiveThreadLock.notifyAll();
}

log.info("Client connection closed successfully");

Console log of the issue


com.microsoft.azure.sdk.iot.device.exceptions.ProtocolException: Unable to disconnect
java.io.IOException: com.microsoft.azure.sdk.iot.device.exceptions.ProtocolException: Unable to disconnect
    at com.microsoft.azure.sdk.iot.device.DeviceIO.close(DeviceIO.java:239)
    at com.microsoft.azure.sdk.iot.device.InternalClient.closeNow(InternalClient.java:165)
    at com.microsoft.azure.sdk.iot.device.DeviceClient.closeNow(DeviceClient.java:466)
    at com.android.configuration.iot.IoTClient.closeNow(IoTClient.java:211)
    at com.android.configuration.iot.IoTClient.access$100(IoTClient.java:75)
    at com.android.configuration.iot.IoTClient$1.run(IoTClient.java:344)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:422)
    at java.util.concurrent.FutureTask.run(FutureTask.java:237)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:152)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:265)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
    at java.lang.Thread.run(Thread.java:841)
Caused by: com.microsoft.azure.sdk.iot.device.exceptions.ProtocolException: Unable to disconnect
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.exceptions.PahoExceptionTranslator.convertToMqttException(PahoExceptionTranslator.java:75)
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt.disconnect(Mqtt.java:166)
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttMessaging.stop(MqttMessaging.java:74)
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.MqttIotHubConnection.close(MqttIotHubConnection.java:264)
    at com.microsoft.azure.sdk.iot.device.transport.IotHubTransport.close(IotHubTransport.java:362)
    at com.microsoft.azure.sdk.iot.device.DeviceIO.close(DeviceIO.java:234)
    ... 12 more
Caused by: Timed out waiting for a response from the server (32000)
    at org.eclipse.paho.client.mqttv3.internal.Token.waitForCompletion(Token.java:112)
    at org.eclipse.paho.client.mqttv3.MqttToken.waitForCompletion(MqttToken.java:67)
    at com.microsoft.azure.sdk.iot.device.transport.mqtt.Mqtt.disconnect(Mqtt.java:152)
    ... 16 more
timtay-microsoft commented 3 years ago

This is definitely a bug, yes. Internally, we are discussing if the deviceClient.closeNow(); call should ever even throw, though. #1338 is the current fix we have. It will make it so the deviceClient.closeNow(); never throws these exceptions in the first place, but still adds some safeguards to make sure that the connection status change callback fires when you'd expect

timtay-microsoft commented 3 years ago

Device client version 1.31.3 has been released and has the fix for this issue, so I'm closing this issue