TooTallNate / Java-WebSocket

A barebones WebSocket client and server implementation written in 100% Java.
http://tootallnate.github.io/Java-WebSocket
MIT License
10.53k stars 2.58k forks source link

Race Conditions when closing server, port is still bound #879

Open Jonahss opened 5 years ago

Jonahss commented 5 years ago

Describe the bug

Hello! I'm not a Java expert, but was happy to find this easy websocket implementation. I've run into a race condition with closing a server, and immediately opening a new server on the same port. This came up as part of my unit tests. Each test passes in isolation, but when they run quickly one after another, well, they started to throw java.net.BindException: Address already in use exceptions.

I've built a really great repro case for you! I was really embarrassed to add a Thread.sleep() call after my server.close() call, so I created a repro.

The following code will run just fine. This is the naiive case of starting a server, closing it, and starting another server. The issue comes up in the particular way that my client is spamming the server to create new connections.

import org.java_websocket.WebSocket;
import org.java_websocket.handshake.ClientHandshake;
import org.java_websocket.server.WebSocketServer;
import org.junit.Test;

import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.URISyntaxException;
import java.nio.ByteBuffer;

public class WebsocketServerTest {

    @Test
    public void QuickStopTest() throws IOException, InterruptedException, URISyntaxException {
        int port = 8765;

        class SimpleServer extends WebSocketServer {

            public SimpleServer(InetSocketAddress address) {
                super(address);
            }

            @Override
            public void onOpen(WebSocket conn, ClientHandshake handshake) {
                conn.send("Welcome to the server!"); //This method sends a message to the new client
                broadcast( "new connection: " + handshake.getResourceDescriptor() ); //This method sends a message to all clients connected
                System.out.println("new connection to " + conn.getRemoteSocketAddress());
            }

            @Override
            public void onClose(WebSocket conn, int code, String reason, boolean remote) {
                System.out.println("closed " + conn.getRemoteSocketAddress() + " with exit code " + code + " additional info: " + reason);
            }

            @Override
            public void onMessage(WebSocket conn, String message) {
                System.out.println("received message from " + conn.getRemoteSocketAddress() + ": " + message);
            }

            @Override
            public void onMessage( WebSocket conn, ByteBuffer message ) {
                System.out.println("received ByteBuffer from "  + conn.getRemoteSocketAddress());
            }

            @Override
            public void onError(WebSocket conn, Exception ex) {
                System.err.println("an error occured on connection " + conn.getRemoteSocketAddress()  + ":" + ex);
            }

            @Override
            public void onStart() {
                System.out.println("server started successfully");
            }
        }
        SimpleServer serverA = new SimpleServer(new InetSocketAddress("localhost", port));
        SimpleServer serverB = new SimpleServer(new InetSocketAddress("localhost", port));

        serverA.start();

        Thread.sleep(3000);

        serverA.stop();
        serverB.start();
    }
}

So that code passes.

BUT. When I have a server spamming connections, the code above will sometimes pass, sometimes fail with my "Address already in use error" and sometimes fail with a ClosedSelectorException.

Here's the code which spams the connection. I did my best to help here, but I had to pull the code out of a Python script which was running, I couldn't repro with a java client.

(use python3 and install the websockets module)

import asyncio
import websockets

# requirements.txt:
# websockets==6.0.0

async def websocket_loop():
    """
    Processes messages from self.queue until mitmproxy shuts us down.
    """
    while True:
        try:
            async with websockets.connect('ws://localhost:8765', max_size = None) as websocket:
                while True:
                    # Make sure connection is still live.
                    await websocket.ping()

        except websockets.exceptions.ConnectionClosed:
            # disconnected from server
            pass
        except BrokenPipeError:
            # Connect failed
            pass
        except IOError:
            # disconnected from server mis-transfer
            pass
        except:
            print("[mitmproxy-node plugin] Unexpected error:", sys.exc_info())
            traceback.print_exc(file=sys.stdout)

worker_event_loop = asyncio.new_event_loop()
worker_event_loop.run_until_complete(websocket_loop())

Expected behavior Well, my unit tests should be able to start servers on the same port as a previously stopped server without adding a 500ms sleep in between.

Debug log The the debug log (set the log level to TRACE) to help explain your problem. ^I could do this if you walk me through how.

Environment(please complete the following information):

PhilipRoman commented 5 years ago

Hello! This could be caused by TIME_WAIT state after closing the server. Have you tried using WebSocketServer.setReuseAddr(true)?

Jonahss commented 5 years ago

Thanks. I gave it a try. Same result.

NoahAndrews commented 5 years ago

I saw a maybe-related issue here: https://github.com/TooTallNate/Java-WebSocket/issues/877#issuecomment-481889358

NoahAndrews commented 5 years ago

Maybe this test is related as well: https://github.com/TooTallNate/Java-WebSocket/issues/877#issuecomment-481890925

marci4 commented 5 years ago

Maybe this test is related as well: #877 (comment)

Not really related, the way I did the test back then sucked and broke when I updated to use SLF4J.

marci4 commented 5 years ago

Was not able to reproduce with the websocket client...

Jonahss commented 5 years ago

so the code in the issue didn't reproduce for you? darn.

marci4 commented 5 years ago

@Jonahss I was simple not able to reproduce it with the java websocket client when I tried to reproduce it quickly. No need to close it yet ;)

Jonahss commented 5 years ago

Oh okay. it's not a terrible issue for me. Most people must not be encountering it. I was able to solve for my use case by adding a short wait. Although inelegant, it doesn't really harm the code, as it's a small project with little use so far.

chenshun00 commented 5 years ago

port is still bound, i have to wait for dozens of seconds. :)

marci4 commented 5 years ago

I tried again with a new pc and differend jre version (12) and still was not able to reproduce it. I however found a ConcurrentModificationException which I fixed with #902.

Best regards, Marcel

mneundorfer commented 5 years ago

Maybe my observation can help on this: I encounter the same problem as @Jonahss (at least I think so :-) ): If I shut down my application while there are still open connections to the server, the issue occurs. So

  1. Start app
  2. Connect client(s)
  3. Stop app
  4. Restart app --> java.net.BindException: Address already in use. Only works after waiting dozens of seconds of sometimes even minutes.

However,

  1. Start app
  2. Connect client(s)
  3. Disconnect client(s)
  4. Stop app
  5. Restart app --> works just fine

I am using the native JavaScript WebSocket client

marci4 commented 5 years ago

Would WebSocketServer.stop(timeout) be a better option for you ? What interval are we talking between stop app and restart? (Do you have used WebSocketServer.setReuseAddr(true) before starting the server?)

Do be honest, only a good project where I can reproduce it can help...

maiph commented 5 years ago

I wrote this test but didn't get it to fail.

public class Issue879Test {

    private static final int NUM_SERVERS_CREATED = 2;
    private static final int NUM_CLIENTS = 10;
    private static final AtomicInteger SERVERS_STARTED_COUNT = new AtomicInteger(0);

    @Test()
    public void testIssue() throws Exception {
        int port = SocketUtil.getAvailablePort();

        for (int i = 0; i < NUM_SERVERS_CREATED; i++) {
            WebSocketServer server = new Issue879Server(new InetSocketAddress(port));
            server.start();

            for (int j = 0; j < NUM_CLIENTS; j++) {
                WebSocketClient client = new Issue879Client(new URI("ws://localhost:" + port));
                client.connectBlocking();
                assert client.isOpen();
            }

            server.stop();
        }

        assert SERVERS_STARTED_COUNT.get() == NUM_SERVERS_CREATED;
    }

    private static class Issue879Server extends WebSocketServer {
        Issue879Server(InetSocketAddress address) {
            super(address);
        }

        @Override
        public void onOpen(WebSocket conn, ClientHandshake handshake) {

        }

        @Override
        public void onClose(WebSocket conn, int code, String reason, boolean remote) {

        }

        @Override
        public void onMessage(WebSocket conn, String message) {

        }

        @Override
        public void onError(WebSocket conn, Exception ex) {
            assert false;
        }

        @Override
        public void onStart() {
            SERVERS_STARTED_COUNT.incrementAndGet();
        }
    }

    private static class Issue879Client extends WebSocketClient {

        Issue879Client(URI serverUri) {
            super(serverUri);
        }

        @Override
        public void onOpen(ServerHandshake handshakedata) {

        }

        @Override
        public void onMessage(String message) {

        }

        @Override
        public void onClose(int code, String reason, boolean remote) {

        }

        @Override
        public void onError(Exception ex) {

        }
    }
}
mneundorfer commented 5 years ago

@marci4 Indeed, I have not. Also, it solves my issue.

To anyone stumbling accross this in the future: Here is the according wiki section which I did not see before... Sorry about that!

pcdv commented 5 years ago

I recently upgraded to 1.4.0 (from 1.3.4) and also witnessed that the websocket port remained in TIME_WAIT after a close(), preventing to reopen the port immediately.

I added a call to setReuseAddr(true) and that fixed my issue.

NB: the problem was occurring on Linux but not on Windows

jimirocks commented 4 years ago

Seems like I have the same issue, reproducible using the client and server from the same (this) library. See this test: https://github.com/Smarteon/loxone-java/blob/master/src/test/groovy/cz/smarteon/loxone/LoxoneWebSocketIT.groovy#L114

I tried also adding reuseAddr but no luck. The test behaves kind of different on different linuxes and JVMs - ie. on OpenJDK11 ubuntu 19.10 it fails 90% times.

marci4 commented 4 years ago

@jimirocks so it is os and Java version dependant? Sounds like an issue in the JVM?

jimirocks commented 4 years ago

@marci4 it happens on all linuxes and JVMs I use (raspbian, ubuntu, fedora, jvm8-11), what differs is the rate. So my guess it's not necessarily the JVM issue.

marci4 commented 4 years ago

@jimirocks was never able to reproduce it on windows. Could you invest some more time and try to understand the issue? Thx

jimirocks commented 4 years ago

@marci4 well, my reproducer is 100% successful using JDK11 on ubuntu. netstat doesn't show the port is in use, however until the JVM ends it's not possible to bind it again even from different process. That means the port remains bound while unused until the original JVM finishes. I haven't found such issue in JDK11 neither some similar complains on it. And if this was the JDK11 BUG I would guess somebody found it sooner than here. Debugging the JDK is beyond my knowledge...

victordiaz commented 4 years ago

I updated to 1.5.1 and got the same problem in Android. I went back to 1.3.x where problems disappears for me and don't have a big need to update for my needs.

Thanks for the library and your effort

pawankgupta-se commented 4 years ago

@marci4 I am also encountered this problem and I used setReuseAddr(true) to solve this issue by as it mentioned in this thread. Is this correct solution for the problem ?

dustContributor commented 4 years ago

Just a heads up. I had the same issue across JVM restarts: Create a WebSocketServer, connect from a browser client (localhost, Firefox), close the WebSocketServer and the whole JVM instance, and then starting it again said the address was already bound.

Using WebSocketServer::setReuseAddr(true) seems to fix it. Thanks!

chenshun00 commented 4 years ago

SO_REUSEADDR allows your server to bind to an address which is in a TIME_WAIT state. It does not allow more than one server to bind to the same address.

http://www.unixguide.net/network/socketfaq/4.11.shtml

xstremasoftware commented 3 years ago

I had the same issue with a server running on Android. setReuseAddr(true) solved the problem

rawnsley commented 3 years ago

I can confirm this worked for me on Android. setReuseAddr(true) appears to prevent the socket entering the problematic _TIMEWAIT state if the app process is killed, which happens as a matter of routine in Android.