eclipse-paho / paho.mqtt.java

Eclipse Paho Java MQTT client library. Paho is an Eclipse IoT project.
https://eclipse.org/paho
Other
2.15k stars 895 forks source link

MQTT gets stuck at during when connecting to server #719

Closed muralimanohar0212 closed 4 years ago

muralimanohar0212 commented 5 years ago

Hi, I'm using eclipse paho for a fun project at home. I'm using Eclipse paho 1.2.1 and I am facing an issue when there is a fluctuation in network.

I am able to connect to the server. If network goes of and comes back on - about 5 - 6 times with a gap of 8 seconds, the device is not connecting to the server. There is a log that says , connecting to server - but it never goes through.

It reaches a deadlock while connecting state. Has an issue been raised regarding this, if not, is there a work around?

rdasgupt commented 4 years ago

@muralimanohar0212 Fix is not in master yet. Please try will develop branch and check if your problem is resolved.

marcostorto commented 4 years ago

We found a nasty side effect of this fix. In our environment we use clean_session=false and the broker (vernemq) can take quite some time to respond to connect message when there are many (100K to 1M) messages waiting to be delivered. In this case the client never completes the connection as MqttException.REASON_CODE_CONNECTION_LOST is raised before the broker respond to connect message. Retrying connection does not help, we get in a sort of deadlock as subsequent connection attempts fail in the same way.

I would strongly advice to remove this fix until this side effect is resolved, I assume it can happen with other brokers as well

rdasgupt commented 4 years ago

@marcostorto Thanks for your feedback. I will check the fix again by adding some artificial delay in my test broker. In our test environment, we have checked with almost 1 million clients and we didn't see any problem. I presume you are using Synchronous client?

marcostorto commented 4 years ago

We use alpakka library which uses async client [https://github.com/akka/alpakka/blob/master/mqtt/src/main/scala/akka/stream/alpakka/mqtt/impl/MqttFlowStage.scala#L147]

We can reproduce pretty easily by the following sequence:

removing this commit connection completes successfully, with this commit connection fails most of the times (although not always)

rdasgupt commented 4 years ago

@marcostorto I tried your use case using vernemq and sample mqtt paho client. The fix worked fine for me.

The vernemq has some non-standard features. For example, multiple clients can connect with same client ID. Are you using same client id for publisher and subscriber?

marcostorto commented 4 years ago

No, we're not using same client id The issue only appears with many persisted messages in queue. I'll try to prepare a test case to show the issue.

rdasgupt commented 4 years ago

@marcostorto Thanks. It will help in debugging and fixing the problem, if you could create a test case. Note that we are planning a release by 10th April. Please try to send the test case ASAP.

rdasgupt commented 4 years ago

@marcostorto I am closing this issue as we couldn't find any issue with #719 fix in our scale tests. However to keep track of your issue, I have opened another issue #763 Please provide with a simple test case. Use the latest develop branch to recreate your problem. Note that there are more fixes related to client hung case in the latest develop branch. It will also help if you provide Paho log from your test env.

ioolkos commented 4 years ago

@marcostorto let me know when you want to look into this VerneMQ side. Most helpful in this case is a Wireshark capture between the client and broker (if that's possible). One thing to try server side: we have seen that increasing the message inflight window in situations like this might help (form the default 20 to something much higher)

rdasgupt commented 4 years ago

@marcostorto @ioolkos I have opened another issue #763 to track problem seen in alpakka library and VerneMQ broker. Please start adding your comments, test results etc in #763

@ioolkos I will copy your last comment in #763