codingchili / chili-core

Reactive framework for creating transport & storage-transparent microservices with Vert.x
https://codingchili.github.io/chili-core/
MIT License
14 stars 5 forks source link

Graceful shutdown of pools #246

Closed codingchili closed 5 years ago

codingchili commented 5 years ago

The current implementation of shutdown notifies all listeners/handlers/services, waits for [shutdown-timeout] seconds and then forcibly shuts down.

We should run all exit handlers, ShutdownListeners and deployables. Undeploy everything to prevent new blocking tasks - and finally wait for any blocking pools/thread pools to shutdown.

With this we could use less durable storage solutions and rely on backups for crash recovery instead of sacrificing runtime performance. We don't want to optimize for a once per year event.

Finally, server restarts will be faster as we don't need to wait for the full timeout at all times. The default timeout should still exist, but default values needs to be increased.

Implementation should be in the System context as the shutdown hook is currently only set when using the Launcher. The changes need also apply to test cases and be testable.

codingchili commented 5 years ago
codingchili commented 5 years ago

Fixed on master, waiting for tests to pass.

codingchili commented 5 years ago

Fixed, unfortunately there is no way in vert.x to gracefully shut down the blocking pool.

If changed to use a custom pool, we would lose metrics. If not changed, blocking tasks will be interrupted on JVM shutdown, which is dangerous for threads that are writing to disk etc..

codingchili commented 5 years ago

It's actually possible to get the pool, at least for shared workers

WorkerPool pool = ((WorkerExecutorInternal) vertx.createSharedWorkerExecutor("wowza")).getPool();

Let's see if we can make it work.

codingchili commented 5 years ago

For ordered tasks a TaskQueue is used, there is a small window where the current task is completed on the executor until a new task is scheduled. If we try to shutdown the executor while there are ordered blocking tasks still queued - they will fail on submission to the executor. This is at least better than interrupting them, but we can't have errors thrown because the task queue tries to schedule the next task - after shutdown of the executor has started. Ideally we would wait for the queue to clear during the shutdown timeout. In case some shutdown activities are using ordered blocking tasks.

This needs to be tested.

codingchili commented 5 years ago

Only log as an error if the shutdown is not graceful. If there's a graceful shutdown caused by an ERROR - the service/listener must still shutdown gracefully. The original cause, if not graceful must be logged with ERROR.

codingchili commented 5 years ago

Waiting for tests.

codingchili commented 5 years ago

Tests passed. Waiting to verify stability before releasing.