eclipse-ee4j / tyrus

Tyrus
Other
115 stars 37 forks source link

ClientManager/BaseContainer shutdown condition not thread safe => RejectedExecutionException #640

Open nuessgens opened 6 years ago

nuessgens commented 6 years ago

In rare cases the ClientManager (or rather BaseContainer) shuts down the executor-service although there are active clients (activeClientCounter != 0)

In my test-scenario the application re-uses the ClientManger instance. The test tries to connect to a non existing server (which - of course - fails). As soon as the connect fails a new connect is made by calling ClientManager.asyncConnectToServer() again. There is no managed executor-service so the BaseContainer maintains it's own and tries to stop it if all sessions/clients where disconnected.

In some situations I get an RejectedExecutionException when calling ClientManager.asyncConnectToServer(). I noticed huge differences regarding the frequency (/reproducibility) of this behavior between different machines (e.g. Windows 10 desktop pc Intel Core i7 80 2.9 GHz Oracle JDK 1.8.0_121 and windows 10 notebook Intel Core i7-7700HQ 2.8Ghz Oracle JDK 1.8.0_131. It is easier to reproduce on the notebook.

I added some log-statements (async-logging) to the following classes. BaseContainer

protected void shutdown(ShutDownCondition shutDownCondition) {
        synchronized (EXECUTORS_CLEAN_UP_LOCK) {
            log.debug("Pre shutdown-evaluation ? " + shutDownCondition.evaluate());
            if (shutDownCondition.evaluate()) {
                log.debug("Going to shutdown " + executorService);
                shutdown();
                log.debug("Post shutdown-evaluation ? " + shutDownCondition.evaluate());
            }
        }
}

ClientManager

Future<Session> connectToServer(final Object o, final ClientEndpointConfig configuration, final String url,
                                    boolean synchronous) throws DeploymentException {

        final Map<String, Object> copiedProperties = new HashMap<String, Object>(properties);

        /* Client activity listener must be called before an executor service is obtained from the container, otherwise
         the counter of active client won't be increased and the executor service might be shut down */
        log.debug("onConnectionInitiated...");
        clientActivityListener.onConnectionInitiated();
        log.debug("...onConnectionInitiated");

        final ExecutorService executorService;
        if (synchronous) {
            executorService = new SameThreadExecutorService();
        } else {
            log.debug("Going to acquire exec");
            executorService = getExecutorService();
            log.debug("Got exec " + executorService);
        }

    [...]
}

In the rare cases mentioned above the following output is made:

2018-08-17 20:18:21,498 DEBUG [][tyrus-1-thread-537] org.glassfish.tyrus.core.BaseContainer - Pre shutdown-evaluation ? true
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - onConnectionInitiated...
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - ...onConnectionInitiated
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - Going to acquire exec
2018-08-17 20:18:21,498 DEBUG [][tyrus-1-thread-537] org.glassfish.tyrus.core.BaseContainer - Going to shutdown java.util.concurrent.ThreadPoolExecutor@617da596[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
2018-08-17 20:18:21,498 DEBUG [][connect-t2] org.glassfish.tyrus.client.ClientManager - Got exec java.util.concurrent.ThreadPoolExecutor@617da596[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
2018-08-17 20:18:21,498 DEBUG [][tyrus-1-thread-537] org.glassfish.tyrus.core.BaseContainer - Post shutdown-evaluation ? false

Here is the corresponding exception:

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@6532da1d rejected from java.util.concurrent.ThreadPoolExecutor@617da596[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
    at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
    at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
    at org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:488)
    at org.glassfish.tyrus.client.ClientManager.asyncConnectToServer(ClientManager.java:417)

IMHO the construct (shutdown evaluation criteria and locking) is not consistent. After T1 evaluated the shutdown-criteria (activeClientCounter.get()==0) another thread T2 could call activeClientCounter.incrementAndGet()/onConnectionInitiated() and get the executor-service by calling getExecutorService(). Because the executorService is not yet null no lock is acquired and the terminating executor-service is returned. After that T1 awakes and shuts down the executor and sets it to null. After that T2 kicks in again and tries to submit the actual connect-task.

nuessgens commented 6 years ago

To simulate the Thread Scheduling madness one can add Thread.sleep() or Thread.yield() between shutting down the executor service and setting it to null: BaseContainer

public void shutdown() {
        if (executorService != null) {
            executorService.shutdown();
            for (int i = 0; i < 100; i++) {
                Thread.yield();
            }
            executorService = null;
        }

        if (scheduledExecutorService != null) {
            scheduledExecutorService.shutdownNow();
            scheduledExecutorService = null;
        }
}

Neither sleep nor yield do release the EXECUTORS_CLEAN_UP_LOCK (acquired in shutdown(ShutDownCondition)) but should simulate that the other thread was granted processing time and the current thread was put on hold.

Here is my test program (.txt -> .java): TyrusRejectBug.txt

nuessgens commented 6 years ago

Furthermore due the RejectedExecutionException activeClientCounter is not decremented / onConnectionTerminated is never called. Therefore activeClientCounter remains > 0 forever and the auto-cleanup mechanism does not work anymore / exec will never be stopped