eclipse-paho / paho.mqtt.java

Eclipse Paho Java MQTT client library. Paho is an Eclipse IoT project.
https://eclipse.org/paho
Other
2.15k stars 893 forks source link

Disconnecting/closing client not possible after connection failed #686

Open dguggemos opened 5 years ago

dguggemos commented 5 years ago

Hi, we are using the Paho MQTT Java client in the Eclispe Ditto project to allow a user to configure MQTT connections to external brokers. That means we do not know in advance if the connections are valid and the user may need some iterations to get all parameters right. The problem that we now face is that we have serious problems to cleanup the resources used by the Paho Client in case a connection failed. To reproduce the behavior I used the following snippet as "MQTT server":

public static void main(String... args) throws Exception {
    final ServerSocket serverSocket = new ServerSocket(4712);
    while (true) {
        final Socket clientSocket = serverSocket.accept();
        System.out.println("Connected: " + clientSocket.getRemoteSocketAddress());
    }
}

Then I tried to create a connection:

public static void main(String... args) throws Exception {
    MqttClient client = new MqttClient("tcp://localhost:4712", "test");
    client.setTimeToWait(5000);
    try {
        MqttConnectOptions connOpts = new MqttConnectOptions();
        connOpts.setConnectionTimeout(5);
        client.connect(connOpts);
    } catch (MqttException me) {
        me.printStackTrace();
        try {
            client.disconnect();
            // client.disconnectForcibly(5000); (1)
            // client.disconnectForcibly(5000, 5000); (2)
            // client.disconnectForcibly(5000, 5000, false); (3)
            client.close();
            // client.close(true); (4)
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The main thread is now waiting indefinitely for a response(?) in the disconnect :

   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000717c00728> (a java.lang.Object)
        at java.lang.Object.wait(Object.java:502)
        at org.eclipse.paho.client.mqttv3.internal.Token.waitForResponse(Token.java:143)
        - locked <0x0000000717c00728> (a java.lang.Object)
        at org.eclipse.paho.client.mqttv3.internal.Token.waitForCompletion(Token.java:108)
        at org.eclipse.paho.client.mqttv3.MqttToken.waitForCompletion(MqttToken.java:63)
        at org.eclipse.paho.client.mqttv3.MqttClient.disconnect(MqttClient.java:349)
        at test.PahoTestClient.main(PahoTestClient.java:33)

I also tried (1), (2), (3), (4) with the result that the client hangs at a different line, but is also not terminating correctly. Skipping the disconnect and force closing the client does not work either, the client says Connect already in progress (32110) in this case.

Currently I see no way of cleaning up after a failed connection attempt. This leaves us with 5 lingering threads which can pile up quickly. Did I miss something or am I doing something wrong here?

dguggemos commented 5 years ago

see also eclipse/ditto#450

pstanton commented 2 years ago

this is a recurring issue on a small percentage of sites. seems impossible to recover from with the current paho code.

pstanton commented 2 years ago

@dguggemos is there any interest in fixing this? We often find clients stuck in "Connect already in progress" which because we re-use the same MqttAsyncClient object - can never recover.

dguggemos commented 2 years ago

No, we switched to another MQTT client implementation, as the problem described seriously affected the stability of our service.

oreillymj commented 2 years ago

@dguggemos - Care to state which Java client you are now using. This repo seems to be unmaintained for over 1 year. I've looked at moving to the HiveMQ client but I'm not a massive fan of it's API. A Paho ->HiveMQ wrapper would be nice for migration.

dguggemos commented 2 years ago

We switched to HiveMQ client (see eclipse/ditto#450).

I'm not a massive fan of it's API

I know what you mean... Apart from that, I can say that we did not run into big issues with HiveMQ since we switched and it works well for us.

pstanton commented 4 months ago

I've found that adding a call to disconnectForcibly() maybe resolves the issue. I'm now doing it whenever an exception is thrown (including timeout) during a connection attempt.

while (...) {
    try {
        log.info("connect...");
        pahoClient.connect(opts)
            .waitForCompletion(TimeUnit.SECONDS.toMillis(MQTT_CONNECTION_TIMEOUT + 5));

        if (!pahoClient.isConnected())
            throw new MqttException(MqttException.REASON_CODE_CLIENT_NOT_CONNECTED);
    }
    catch (Throwable e) {
        log.warn("mqtt connection failed to broker {} retry {}", pahoClient.getServerURI(), retry, e);

        try {
            pahoClient.disconnectForcibly();
            log.info("forced disconnect");
        }
        catch (MqttException e1) {
            log.warn("forced disconnect failed {}", e1.getMessage());
        }

        // sleep...
    }
}
pstanton commented 3 months ago

^ sorry, this does not help. the issue remains with disconnectForcibly

pstanton commented 3 months ago

I've resolved to throw out the client (and construct a new one) if this exception is observed. I'm not sure if this will cause any memory leaks but it is one way to recover from the issue. I've tested that this works by emulating the error.