TooTallNate / Java-WebSocket

A barebones WebSocket client and server implementation written in 100% Java.
http://tootallnate.github.io/Java-WebSocket
MIT License
10.35k stars 2.56k forks source link

connectTimeout not fully respected #1397

Closed ysi-camerona closed 3 months ago

ysi-camerona commented 4 months ago

Describe the bug WebSocketClient takes in a connectTimeout parameter. This parameter is used for the initial socket connection, but not when setting up the websocket connection. Poor network connectivity may mean we can set up the socket connection but not the websocket, leading to scenarios where connectBlocking blocks indefinitely.

To Reproduce I ran this test on a Debian 10 system, using iptables to simulate the poor connectivity condition.

  1. Set up iptables firewall rules to begin dropping TCP traffic after six packets for a connection have been transferred:
    sudo iptables -F INPUT
    # Allow direct SSH access
    sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
    # Allow pings
    sudo iptables -A INPUT -p icmp -j ACCEPT
    # Allow all DNS lookups
    sudo iptables -A INPUT -p udp --source-port 53 -j ACCEPT
    # For all other connections, only allow the first six packets (both directions considered) to pass
    sudo iptables -A INPUT -m connbytes --connbytes-mode packets --connbytes-dir both ! --connbytes 6 -j ACCEPT
    # Drop all other traffic
    sudo iptables -A INPUT --in-interface eth0 -j DROP
  2. Compile and run this sample: https://github.com/ysi-camerona/java-websocket-bug-feb-9-2024
  3. Observe that the graceful client (client with a fix to respect connectTimeout for the websocket) fails after five seconds, whereas the default client blocks indefinitely

Example application to reproduce the issue https://github.com/ysi-camerona/java-websocket-bug-feb-9-2024

Expected behavior I would have expected that if connectBlocking() is called, and connectTimeout has elapsed before onOpen(ServerHandshake) is called, connectBlocking would return false.

Debug log Logs from the sample application:

INFO  | 2024-02-09 17:05:41 | main | WebsocketConnectionTest | 21 | Graceful client - connecting with 5000 ms timeout
INFO  | 2024-02-09 17:05:42 | WebSocketConnectReadThread-9 | GracefulTimeoutClient | 53 | Set socket timeout to 5000 ms
INFO  | 2024-02-09 17:05:47 | WebSocketConnectReadThread-9 | GracefulTimeoutClient | 41 | onError called
INFO  | 2024-02-09 17:05:47 | WebSocketWriteThread-10 | GracefulTimeoutClient | 36 | onClose called
INFO  | 2024-02-09 17:05:47 | main | WebsocketConnectionTest | 23 | Graceful client - finished connecting, result: false
INFO  | 2024-02-09 17:05:47 | main | WebsocketConnectionTest | 25 | Graceful client - closed connection
INFO  | 2024-02-09 17:05:47 | main | WebsocketConnectionTest | 30 | Default client - connecting with 5000 ms timeout

Nothing else had been logged by 2024-02-09 17:10:00.

Environment:

Additional context GracefulTimeoutClient shows a potential fix for this: set the socket SO timeout to the user-defined value in onSetSSLParameters, then set it back to zero after the websocket has connected.

PhilipRoman commented 4 months ago

Hi, thanks for the very detailed bug report and steps to reproduce! I wish more of our reports were so descriptive 😆

Does this behaviour cause real issues for you? My impression is that the timeout parameter to connectBlocking() should take care of this for most use cases. Either way, I will look into it, but there are multiple high priority issues currently on my backlog, so I can't promise anything.

I looked at your implementation of GracefulTimeoutClient, but I'm not sure why the fix is related to SSL parameters? Does the issue happen only with encrypted WSS? I assume it also affects regular unencrypted websockets.

ysi-camerona commented 4 months ago

Hi Philip, thanks for your reply! Here are my responses:

Does this behaviour cause real issues for you?

Not often, but yes. I use Java-WebSocket on devices deployed in the field with cellular connections. The cellular quality can really vary, and I've seen this happen a number of times. When it happens, connectBlocking usually blocks for 10-20 minutes before returning.

My impression is that the timeout parameter to connectBlocking() should take care of this for most use cases

I had looked at this too, but I think using it would cause the connect attempt to continue in the background even after connectBlocking(long, TimeUnit) returns, whereas when the no-arg connectBlocking() returns false, the connection has been torn down. Does this understanding sound right?

I'm not sure why the fix is related to SSL parameters? Does the issue happen only with encrypted WSS? I assume it also affects regular unencrypted websockets.

Good question, I should have mentioned that the fix is unrelated to SSL parameters. I decided to hook my changes in there because I use WSS exclusively. It's a convenient callback that is invoked after the socket is created, but before the first read from the socket's InputStream. Yes, this also affects plain WS connections: I pointed my example project to ws://ws.ifelse.io/ and confirmed that the graceful client stalls out (makes sense since onSetSSLParameters is not invoked).

PhilipRoman commented 4 months ago

using it would cause the connect attempt to continue in the background even after connectBlocking(long, TimeUnit) returns, whereas when the no-arg connectBlocking() returns false, the connection has been torn down.

Looks like you're right, it is a flaw in the function design. The private method reset seems like it would do what you expect, but it is only used in reconnect currently. If you don't call reconnect, there is no socket cleanup as far as i can tell.

While I can implement the fix similar to your provided demo, I think the correct way would be making connectBlocking with timeout properly clean up the connection threads on failure. IMO it is not good idea to rely on socket timeouts, as they have a hacky implementation in the JDK on some platforms. With connect timeout+cleanup we could be sure there are no corner cases left.

ysi-camerona commented 4 months ago

Agree, having connectBlocking(long, TimeUnit) handle the cleanup on timeout seems like a cleaner option. Would you like me to test that approach and put together a PR?

PhilipRoman commented 4 months ago

That would be great!

ysi-camerona commented 4 months ago

Great, I'll start on that in the next few days with the goal of having it done on Friday.

marci4 commented 4 months ago

@PhilipRoman ping me if I should make a release for this

ysi-camerona commented 4 months ago

@PhilipRoman I spiked out a fix this afternoon but have not tested it extensively: https://github.com/ysi-camerona/Java-WebSocket/commit/f7d51a84cfaeb829d0edaa40c0aa0df1ad0bdf67

Can you give it a once-over and let me know what you think? I want to make sure we're happy with this approach before testing it thoroughly. The commit message has some additional context. Thanks!

PhilipRoman commented 4 months ago

I think it is good. I recommend adding null check (and also a comment) before socket.close, as while calling reconnect() on a failed-to-connect socket is questionable, I would still expect it to work.

ysi-camerona commented 3 months ago

Thanks, will do! I'll make those changes, add a unit test, and then aim to have a PR ready in the next day or so.

ysi-camerona commented 3 months ago

PR submitted: https://github.com/TooTallNate/Java-WebSocket/pull/1399