Closed codingchili closed 5 years ago
Fixed on master, waiting for tests to pass.
Fixed, unfortunately there is no way in vert.x to gracefully shut down the blocking pool.
If changed to use a custom pool, we would lose metrics. If not changed, blocking tasks will be interrupted on JVM shutdown, which is dangerous for threads that are writing to disk etc..
It's actually possible to get the pool, at least for shared workers
WorkerPool pool = ((WorkerExecutorInternal) vertx.createSharedWorkerExecutor("wowza")).getPool();
Let's see if we can make it work.
For ordered tasks a TaskQueue
is used, there is a small window where the current task is completed on the executor until a new task is scheduled. If we try to shutdown the executor while there are ordered blocking tasks still queued - they will fail on submission to the executor. This is at least better than interrupting them, but we can't have errors thrown because the task queue tries to schedule the next task - after shutdown of the executor has started. Ideally we would wait for the queue to clear during the shutdown timeout. In case some shutdown activities are using ordered blocking tasks.
This needs to be tested.
Only log as an error if the shutdown is not graceful. If there's a graceful shutdown caused by an ERROR - the service/listener must still shutdown gracefully. The original cause, if not graceful must be logged with ERROR.
Waiting for tests.
Tests passed. Waiting to verify stability before releasing.
The current implementation of shutdown notifies all listeners/handlers/services, waits for [shutdown-timeout] seconds and then forcibly shuts down.
We should run all exit handlers, ShutdownListeners and deployables. Undeploy everything to prevent new blocking tasks - and finally wait for any blocking pools/thread pools to shutdown.
With this we could use less durable storage solutions and rely on backups for crash recovery instead of sacrificing runtime performance. We don't want to optimize for a once per year event.
Finally, server restarts will be faster as we don't need to wait for the full timeout at all times. The default timeout should still exist, but default values needs to be increased.
Implementation should be in the System context as the shutdown hook is currently only set when using the Launcher. The changes need also apply to test cases and be testable.