Open tomekjarosik opened 5 years ago
I think I have the same problem. When using version 1.2.1 to connect to a broker with ws:// or wss://, the client always disconnects after 1 second. Version 1.2.0 works fine.
I tried to increase the SO_TIMEOUT to 5 seconds, but the client just disconnects after 5 seconds instead of 1. I think it has something to do with the way WebSocket(Secure)NetworkModule use Input/Output streams
We have the same issue with Kura: we configure WebSocket connection, it connects and than it disconnects. It worked fine in 1.2.0.
Any update on this? Did you find a workaround?
I think PR #655 should fix the problem
@kandrej I think we should clarify with @rehanshaukat the intention of adding the socket timeout. We have successfully used 1.2.0 in production without any of the issues mentioned in the commit log.
There is no socket read timeout set currently and default is to set to zero i.e. wait forever. Adding socket read timeout will help recovering from edge cases where packets are lost on the wire and
MqttInputStream.readMqttWireMessage()
fails to notifyClientState
.
The MQTT PING message is used exactly for this purpose and works as expected after reverting the commit.
In fact,
readMqttWireMessage
does handle the case for exception thrown due to socket read timeout and ignore it.
Do you mean this?
I'm the author of that change which was added as a result of #28.
@cdealti I agree we should clarify. Please consider adding a comment in the code in case the socket timeout is not needed as the history of the file shows it was added and removed at least 2 times.
However, note that in the current state readMqttWireMessage will not catch the socket read timeout because it reads from a PipedInputStream, which has been closed as the result of an exception described by @tomekjarosik
Sorry for replying very late on this issue.
I haven't used 1.2.1 in production yet. Bit of context: we use https://github.com/aws/aws-iot-device-sdk-java to connect to aws IoT services. AWS IoT SDK use old version of paho 1.1.0. Random IoT stopped working issues were noticed when using WebSocket connections. While reading paho codebase, I spotted that socket timeout was set to unlimited which did not look sensible and also the old commit where socket read timeout was removed did not explain why it was removed so I added the socket timeout back and raised a PR. We very recently started using paho 1.2.0 at a small level in production which is giving better results (not 100% confirmed that the issue we were seeing in 1.1.0 is fixed completely).
I think SocketTimeoutException should not make the thread stop. If it is stoping the receiver then #655 should fix this.
@kandrej the socket read timeout was added with e71753673c0f3dadbb298392b338e7452c91a35a to support SSL session resumption #28. This contribution took one year and a half to be merged in the codebase and released with v1.1.1. The contribution was meant for MQTT over TCP or SSL (not Websockets) and came with a test.
The socket timeout was removed with bbd6dc730cc88b09226086998735cfc5867582fc without any comment and, I think, defeating the whole purpose of the the above contribution. This was released with v1.2.0.
@rehanshaukat re-added the one second socket timeout causing this Websockets issue with v1.2.1 (so I think we need a regression test for this).
Finally #655 ignores the timeout exception. However, I think Websockets are now broken again as WebSocketFrame
expects to read the full frame from the socket without any timeout which might not be always the case with large payloads and unreliable links like cellular links. I think we need a test for this
Please fill out the form below before submitting, thank you!
Version 1.2.1 introduced this change: https://github.com/eclipse/paho.mqtt.java/commit/d5379c3477119f9bdd59069271ddd21efb8c9628#diff-4facee5fd67e0fc8ce39f657db10c617 @rehanshaukat
This means WebSocketReceiver.java will get java.net.SocketTimeoutException when trying to read https://github.com/eclipse/paho.mqtt.java/blob/57c2c1e9830d8a8036bbb2513e22738c121a254f/org.eclipse.paho.client.mqttv3/src/main/java/org/eclipse/paho/client/mqttv3/internal/websocket/WebSocketReceiver.java#L103.
The exception will be caught later and the thread will be forced to stop: https://github.com/eclipse/paho.mqtt.java/blob/57c2c1e9830d8a8036bbb2513e22738c121a254f/org.eclipse.paho.client.mqttv3/src/main/java/org/eclipse/paho/client/mqttv3/internal/websocket/WebSocketReceiver.java#L118
Because of that, the connection will often ends early, resulting with repeated connections attempts. And one client can create many more connection attempts than previous 1.2.0 version. The 1.2.1 puts much more pressure on the server-side. This can be reproduced with simple unit-test (pseudo-code):
From the client perspective I'd like to have a way to provide custom SO_TIMEOUT. Another way to fix this could be to make SO_TIMEOUT longer, let's say same as connection timeout by default. Or maybe even socket read timeout exception should be treated differently.
Could you please review the code in the above links and discuss what's the proper solution should be?