Closed ysi-camerona closed 7 months ago
Hi, thanks for the very detailed bug report and steps to reproduce! I wish more of our reports were so descriptive 😆
Does this behaviour cause real issues for you? My impression is that the timeout parameter to connectBlocking() should take care of this for most use cases. Either way, I will look into it, but there are multiple high priority issues currently on my backlog, so I can't promise anything.
I looked at your implementation of GracefulTimeoutClient, but I'm not sure why the fix is related to SSL parameters? Does the issue happen only with encrypted WSS? I assume it also affects regular unencrypted websockets.
Hi Philip, thanks for your reply! Here are my responses:
Does this behaviour cause real issues for you?
Not often, but yes. I use Java-WebSocket on devices deployed in the field with cellular connections. The cellular quality can really vary, and I've seen this happen a number of times. When it happens, connectBlocking usually blocks for 10-20 minutes before returning.
My impression is that the timeout parameter to connectBlocking() should take care of this for most use cases
I had looked at this too, but I think using it would cause the connect attempt to continue in the background even after connectBlocking(long, TimeUnit)
returns, whereas when the no-arg connectBlocking()
returns false, the connection has been torn down. Does this understanding sound right?
I'm not sure why the fix is related to SSL parameters? Does the issue happen only with encrypted WSS? I assume it also affects regular unencrypted websockets.
Good question, I should have mentioned that the fix is unrelated to SSL parameters. I decided to hook my changes in there because I use WSS exclusively. It's a convenient callback that is invoked after the socket is created, but before the first read from the socket's InputStream. Yes, this also affects plain WS connections: I pointed my example project to ws://ws.ifelse.io/
and confirmed that the graceful client stalls out (makes sense since onSetSSLParameters
is not invoked).
using it would cause the connect attempt to continue in the background even after connectBlocking(long, TimeUnit) returns, whereas when the no-arg connectBlocking() returns false, the connection has been torn down.
Looks like you're right, it is a flaw in the function design. The private method reset
seems like it would do what you expect, but it is only used in reconnect
currently. If you don't call reconnect, there is no socket cleanup as far as i can tell.
While I can implement the fix similar to your provided demo, I think the correct way would be making connectBlocking with timeout properly clean up the connection threads on failure. IMO it is not good idea to rely on socket timeouts, as they have a hacky implementation in the JDK on some platforms. With connect timeout+cleanup we could be sure there are no corner cases left.
Agree, having connectBlocking(long, TimeUnit)
handle the cleanup on timeout seems like a cleaner option. Would you like me to test that approach and put together a PR?
That would be great!
Great, I'll start on that in the next few days with the goal of having it done on Friday.
@PhilipRoman ping me if I should make a release for this
@PhilipRoman I spiked out a fix this afternoon but have not tested it extensively: https://github.com/ysi-camerona/Java-WebSocket/commit/f7d51a84cfaeb829d0edaa40c0aa0df1ad0bdf67
Can you give it a once-over and let me know what you think? I want to make sure we're happy with this approach before testing it thoroughly. The commit message has some additional context. Thanks!
I think it is good. I recommend adding null check (and also a comment) before socket.close, as while calling reconnect() on a failed-to-connect socket is questionable, I would still expect it to work.
Thanks, will do! I'll make those changes, add a unit test, and then aim to have a PR ready in the next day or so.
PR submitted: https://github.com/TooTallNate/Java-WebSocket/pull/1399
Describe the bug WebSocketClient takes in a connectTimeout parameter. This parameter is used for the initial socket connection, but not when setting up the websocket connection. Poor network connectivity may mean we can set up the socket connection but not the websocket, leading to scenarios where connectBlocking blocks indefinitely.
To Reproduce I ran this test on a Debian 10 system, using iptables to simulate the poor connectivity condition.
Example application to reproduce the issue https://github.com/ysi-camerona/java-websocket-bug-feb-9-2024
Expected behavior I would have expected that if
connectBlocking()
is called, and connectTimeout has elapsed beforeonOpen(ServerHandshake)
is called, connectBlocking would return false.Debug log Logs from the sample application:
Nothing else had been logged by 2024-02-09 17:10:00.
Environment:
Additional context GracefulTimeoutClient shows a potential fix for this: set the socket SO timeout to the user-defined value in
onSetSSLParameters
, then set it back to zero after the websocket has connected.