EsotericSoftware / kryonet

TCP/UDP client/server library for Java, based on Kryo
BSD 3-Clause "New" or "Revised" License
1.82k stars 417 forks source link

Server.close() hangs for several seconds/minutes #142

Open GoJermangoGo opened 6 years ago

GoJermangoGo commented 6 years ago

When calling Server.close(), the thread will hang anywhere from a few seconds to 10+ minutes. On older, slower computers, the hang can last over half an hour. Client.close() has a similar problem but will only hang up to a few seconds. Occasionally, the hang ends with a ClosedSelectorException. This problem was introduced in a7e966136798b8e8cd8898bc461e014ed4029d09, which was a fix to #93. Tested on Linux (Lubuntu) using the ChatServer example.

NathanSweet commented 6 years ago

Can you provide example code that shows this problem?

GoJermangoGo commented 6 years ago

Simply running new ChatServer() (from the examples folder) and closing the window causes a long delay before the JVM finally exits. All 3 linux machines I tested it on had the same problem; not sure about Windows.

quarante-2 commented 6 years ago

I can confirm this problem. It also happens sometimes when trying to connect to the server. I tried to reproduce it. I'm not sure if it works on your machine.

// Client#connect calls Client#close() internally. This leads to the process getting stuck (sometimes).
// See Client#close(), selector.selectNow().

Server server = new Server();
server.start();
server.bind(54555, 54777);

// Required. Works otherwise.
new Thread(new Runnable() {
    public void run() {
        while (true) {
            System.out.println("work");
        }
    }
}).start();

// Wait for the worker to start.
Thread.sleep(1000);

Client client = new Client();
client.start();
// It may be necessary to increase the timeout.
client.connect(5000, "localhost", 54555, 54777);
GoJermangoGo commented 6 years ago

I don't recall ever having an issue with connecting (at least not to the same extreme as Server.close()) but I can see how that could be related to the same problem with the internal call to close() in connect().

payne911 commented 4 years ago

I've occasionally had minor issues while connecting, and definitely have had that issue while disconnecting.

I'm on Windows 10, using libGDX, Java 14 with feature preview, on IntelliJ 2020.1.2.

@NathanSweet any updates regarding the provided piece of code?

payne911 commented 4 years ago

@luisfonsivevo Do you happen to call server.dispose() on your server?

See Issue https://github.com/EsotericSoftware/kryonet/issues/82. When I commented the call to my client.dispose(), my problem disappeared.

With client.dispose()

Hanging thread, with non-responding libGDX app not actually closing after closing the window. However, I do get the following log:

00:10  INFO: [kryonet] Connection 10 disconnected.
Exception in thread "Client" java.nio.channels.ClosedSelectorException
    at java.base/sun.nio.ch.SelectorImpl.ensureOpen(SelectorImpl.java:80)
    at java.base/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:123)
    at java.base/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:141)
    at com.esotericsoftware.kryonet.Client.update(Client.java:244)
    at com.esotericsoftware.kryonet.Client.run(Client.java:356)
    at java.base/java.lang.Thread.run(Thread.java:832)

Without client.dispose()

No Exception logged, but also nothing mentioning that the Client was disconnected.

Server-side

In both cases, while the logs differed on the client-side, the server-side always mentioned how the Client had disconnected.

I'm not sure what to think of this.


EDIT

I tested with a simple Listener and then with a ThreadedListener. I get the same behavior in terms of logging, however, the simple Listener hangs for much less time.

        this.client = new Client();
        Register.registerClasses(client);
        this.addr = isLocalServer
                ? InetAddress.getLocalHost()
                : InetAddress.getByName(REMOTE_SERVER);

//        client.addListener(new Listener.ThreadedListener(onReceiveCallback));
        client.addListener(onReceiveCallback);
        client.start();
        client.connect(TIMEOUT, addr, PORT);
Hotshot5000 commented 3 years ago

It looks like it's related to commit a7e9661 Fixed deadlock in EndPoint#close(). When I rolled that back it got working immediately. No more delays.

doriancransac commented 3 years ago

+1

I've been experiencing this on the client side too (I don't close the server nearly as often nor the same way, I just kill it, so I don't know if the same problem happens server side).

dmccabe commented 3 months ago

I have yet to experience the hang on close issue, but I was seeing exceptions on Client.dispose, so I took a look at why that was happening.

It looks like there's some gaps in the synchronization logic being used. dispose calls selector.close, which will cause any further operations on selector to generate a ClosedSelectorException. If the update thread is still running, it will throw an exception when it tries to use the selector.

Calling stop will shutdown the update thread, which can avoid this problem on dispose. However, it's not immediate, and for good reason - there's no safe way to shutdown a thread immediately in Java. If there is an update in progress when we invoke shutdown, it will throw the exception.

Order of events that would create the exception:

  1. Update iteration starts on update thread
  2. selector.select is invoked with standard 250ms timeout
  3. dispose is called while still blocked on selector.select

To avoid the exception, we need to following order of operations:

  1. Client.stop is invoked
  2. Wait until update thread is done to ensure no concurrency issues
  3. Client.dispose can now be safely invoked

This approach should mostly get rid of the exception:

client.stop();
client.getUpdateThread().join(1000L);
client.dispose();

Some changes would need to be made to the synchronization logic to avoid this issue, but I think this workaround mostly does the job. It may not do anything for the hang on shutdown though, it's not unlikely that's a separate locking issue.

The changes in commit https://github.com/EsotericSoftware/kryonet/commit/a7e966136798b8e8cd8898bc461e014ed4029d09 do look suspect, as it pulls the selector usage out of the critical section, so that could definitely be related to the hang reported.