Closed muralimanohar0212 closed 4 years ago
@muralimanohar0212 Fix is not in master yet. Please try will develop branch and check if your problem is resolved.
We found a nasty side effect of this fix. In our environment we use clean_session=false and the broker (vernemq) can take quite some time to respond to connect message when there are many (100K to 1M) messages waiting to be delivered. In this case the client never completes the connection as MqttException.REASON_CODE_CONNECTION_LOST is raised before the broker respond to connect message. Retrying connection does not help, we get in a sort of deadlock as subsequent connection attempts fail in the same way.
I would strongly advice to remove this fix until this side effect is resolved, I assume it can happen with other brokers as well
@marcostorto Thanks for your feedback. I will check the fix again by adding some artificial delay in my test broker. In our test environment, we have checked with almost 1 million clients and we didn't see any problem. I presume you are using Synchronous client?
We use alpakka library which uses async client [https://github.com/akka/alpakka/blob/master/mqtt/src/main/scala/akka/stream/alpakka/mqtt/impl/MqttFlowStage.scala#L147]
We can reproduce pretty easily by the following sequence:
removing this commit connection completes successfully, with this commit connection fails most of the times (although not always)
@marcostorto I tried your use case using vernemq and sample mqtt paho client. The fix worked fine for me.
The vernemq has some non-standard features. For example, multiple clients can connect with same client ID. Are you using same client id for publisher and subscriber?
No, we're not using same client id The issue only appears with many persisted messages in queue. I'll try to prepare a test case to show the issue.
@marcostorto Thanks. It will help in debugging and fixing the problem, if you could create a test case. Note that we are planning a release by 10th April. Please try to send the test case ASAP.
@marcostorto I am closing this issue as we couldn't find any issue with #719 fix in our scale tests. However to keep track of your issue, I have opened another issue #763 Please provide with a simple test case. Use the latest develop branch to recreate your problem. Note that there are more fixes related to client hung case in the latest develop branch. It will also help if you provide Paho log from your test env.
@marcostorto let me know when you want to look into this VerneMQ side. Most helpful in this case is a Wireshark capture between the client and broker (if that's possible). One thing to try server side: we have seen that increasing the message inflight window in situations like this might help (form the default 20 to something much higher)
@marcostorto @ioolkos I have opened another issue #763 to track problem seen in alpakka library and VerneMQ broker. Please start adding your comments, test results etc in #763
@ioolkos I will copy your last comment in #763
Hi, I'm using eclipse paho for a fun project at home. I'm using Eclipse paho 1.2.1 and I am facing an issue when there is a fluctuation in network.
I am able to connect to the server. If network goes of and comes back on - about 5 - 6 times with a gap of 8 seconds, the device is not connecting to the server. There is a log that says , connecting to server - but it never goes through.
It reaches a deadlock while connecting state. Has an issue been raised regarding this, if not, is there a work around?