ebean-orm / ebean

Ebean ORM
https://ebean.io
Apache License 2.0
1.46k stars 260 forks source link

No delay on SIGTERM before shutting down future count ThreadPoolExecutor #3420

Closed AntoineDuComptoirDesPharmacies closed 2 months ago

AntoineDuComptoirDesPharmacies commented 2 months ago

To be honest, I do not know if this issue should be created here or on the following project : https://github.com/playframework/play-ebean

Context

We are running Play Framework App with Ebean embedded on AWS Fargate using a mix of SPOT provider and standard provider to reach HA. This morning, as often, we had a preemption of our SPOT instance, so receiving a SIGTERM on our app. This is a normal behavior as this node will then relaunch using standard provider.

Expected behavior

When application receive SIGTERM, ThreadPoolExecutor used by Ebean to get futureCount should wait that all current HTTP request have been processed before shutting down.

Actual behavior

When application receive SIGTERM, ThreadPoolExecutor used by Ebean to get futureCount shutdown immediately, so that if a request is being processed in the same lapse of time, it will be rejected by ThreadPoolExecutor.

Steps to reproduce

  1. Run Play App with following config, allowing Play to wait (maximum) 30 seconds and answer current requests before shutting down.
    
    server {
    pekko {
      terminationTimeout = 30 seconds
    }
    }
    }

pekko.coordinated-shutdown.phases.service-unbind.timeout = 35

2. Send at near exactly the same time : 
   - A GET request that will use Ebean method `getFutureCount` after few milliseconds
   - A SIGTERM to the app
3. When SIGTERM will be processed, Play framework will continue processing the request until finished before shutting down. 
When the controller associated to the GET request route will call `futureCount` there is high probability that ThreadPoolExecutor is already shutdown and submit of the task will be rejected

```console
2024-06-13T07:44:28.623415591Z   lcdp-monolith-service 1104     ... 38 common frames omitted
--
2024-06-13T07:44:28.623351536Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.ConstraintLogic.lambda$restrict$5(ConstraintLogic.java:141)
2024-06-13T07:44:28.623276135Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.ConstraintLogic.pass(ConstraintLogic.java:475)
2024-06-13T07:44:28.623234579Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.actions.AbstractDeadboltAction.authorizeAndExecute(AbstractDeadboltAction.java:280)
2024-06-13T07:44:28.623153969Z   lcdp-monolith-service 1104     at   actions.SentryOn5XXAction.call(SentryOn5XXAction.java:36)
2024-06-13T07:44:28.623093501Z   lcdp-monolith-service 1104     at   play.http.DefaultActionCreator$1.call(DefaultActionCreator.java:31)
2024-06-13T07:44:28.623015851Z   lcdp-monolith-service 1104     at   play.core.j.JavaAction$$anon$1.call(JavaAction.scala:128)
2024-06-13T07:44:28.622955794Z   lcdp-monolith-service 1104     at   play.api.mvc.BodyParser$.parseBody(Action.scala:239)
2024-06-13T07:44:28.622914533Z   lcdp-monolith-service 1104     at scala.Option.getOrElse(Option.scala:201)
2024-06-13T07:44:28.622842916Z   lcdp-monolith-service 1104     at   play.api.mvc.BodyParser$.$anonfun$parseBody$4(Action.scala:241)
2024-06-13T07:44:28.622750239Z   lcdp-monolith-service 1104     at   play.core.j.JavaAction$$anon$1.$anonfun$call$1(JavaAction.scala:127)
2024-06-13T07:44:28.622666514Z   lcdp-monolith-service 1104     at   play.core.routing.HandlerInvokerFactory$JavaActionInvokerFactory$$anon$3$$anon$4$$anon$5.invocation(HandlerInvoker.scala:125)
2024-06-13T07:44:28.622624534Z   lcdp-monolith-service 1104     at   play.core.routing.HandlerInvokerFactory$$anon$8.resultCall(HandlerInvoker.scala:159)
2024-06-13T07:44:28.622541829Z   lcdp-monolith-service 1104     at   play.core.routing.HandlerInvokerFactory$$anon$8.resultCall(HandlerInvoker.scala:160)
2024-06-13T07:44:28.622461148Z   lcdp-monolith-service 1104     at   notification.Routes$$anonfun$routes$1.$anonfun$applyOrElse$24(Routes.scala:422)
2024-06-13T07:44:28.622379626Z   lcdp-monolith-service 1104     at   provide.notification.controllers.ManageNotificationApiController.updateNotification(ManageNotificationApiController.java:108)
2024-06-13T07:44:28.622334429Z   lcdp-monolith-service 1104     at   provide.notification.controllers.ManageNotificationApiControllerImp.updateNotification(ManageNotificationApiControllerImp.java:51)
2024-06-13T07:44:28.622259609Z   lcdp-monolith-service 1104     at   service.NotificationService.updateNotification(NotificationService.java:120)
2024-06-13T07:44:28.62217705Z   lcdp-monolith-service 1104  at   service.NotificationService.getNotification(NotificationService.java:70)
2024-06-13T07:44:28.622073216Z   lcdp-monolith-service 1104     at   domain.notificationSendings.NotificationSendingDatabaseBook.getNotificationSendingsFromDatabase(NotificationSendingDatabaseBook.java:139)
2024-06-13T07:44:28.621977643Z   lcdp-monolith-service 1104     at   fr.lcdp.ebean.utils.searchResult.executors.QueryExecutor.execute(QueryExecutor.java:10)
2024-06-13T07:44:28.621843872Z   lcdp-monolith-service 1104     at   fr.lcdp.ebean.utils.searchResult.executors.BaseQueryExecutor.doExecute(BaseQueryExecutor.java:92)
2024-06-13T07:44:28.621802773Z   lcdp-monolith-service 1104     at   io.ebeaninternal.server.query.LimitOffsetPagedList.loadCount(LimitOffsetPagedList.java:40)
2024-06-13T07:44:28.621727319Z   lcdp-monolith-service 1104     at   io.ebeaninternal.server.query.LimitOffsetPagedList.getFutureCount(LimitOffsetPagedList.java:48)
2024-06-13T07:44:28.621645093Z   lcdp-monolith-service 1104     at   io.ebeaninternal.server.core.DefaultServer.findFutureCount(DefaultServer.java:1199)
2024-06-13T07:44:28.621573761Z   lcdp-monolith-service 1104     at   io.ebeaninternal.server.executor.DefaultBackgroundExecutor.execute(DefaultBackgroundExecutor.java:104)
2024-06-13T07:44:28.621532833Z   lcdp-monolith-service 1104     at   io.ebeaninternal.server.executor.DefaultBackgroundExecutor.submit(DefaultBackgroundExecutor.java:99)
2024-06-13T07:44:28.621458364Z   lcdp-monolith-service 1104     at   io.ebeaninternal.server.executor.DaemonExecutorService.submit(DaemonExecutorService.java:35)
2024-06-13T07:44:28.621373597Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
2024-06-13T07:44:28.621297372Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
2024-06-13T07:44:28.621263785Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
2024-06-13T07:44:28.621184074Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown   Source)
2024-06-13T07:44:28.621095654Z   lcdp-monolith-service 1104 Caused by:   java.util.concurrent.RejectedExecutionException: Task   java.util.concurrent.FutureTask@529314e5[Not completed, task =   java.util.concurrent.Executors$RunnableAdapter@68f7cf05[Wrapped task =   io.ebeaninternal.server.executor.DefaultBackgroundExecutor$$Lambda/0x00007f01f501c360@1fa833d1]]   rejected from java.util.concurrent.ThreadPoolExecutor@27353d1c[Shutting down,   pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 118450]
2024-06-13T07:44:28.621010272Z   lcdp-monolith-service 1104     at java.base/java.lang.Thread.run(Unknown Source)
2024-06-13T07:44:28.620967504Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2024-06-13T07:44:28.620900677Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
2024-06-13T07:44:28.620803198Z   lcdp-monolith-service 1104     at   org.apache.pekko.dispatch.TaskInvocation.run(AbstractDispatcher.scala:59)
2024-06-13T07:44:28.620733443Z   lcdp-monolith-service 1104     at   org.apache.pekko.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:110)
2024-06-13T07:44:28.62065749Z   lcdp-monolith-service 1104  at   scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
2024-06-13T07:44:28.620615667Z   lcdp-monolith-service 1104     at   scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
2024-06-13T07:44:28.620535333Z   lcdp-monolith-service 1104     at   org.apache.pekko.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:110)
2024-06-13T07:44:28.620462662Z   lcdp-monolith-service 1104     at   org.apache.pekko.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:73)
2024-06-13T07:44:28.620422057Z   lcdp-monolith-service 1104     at   scala.concurrent.impl.Promise$Transformation.run(Promise.scala:470)
2024-06-13T07:44:28.62034425Z   lcdp-monolith-service 1104  at   play.api.mvc.BodyParser$.$anonfun$runParserThenInvokeAction$1(Action.scala:260)
2024-06-13T07:44:28.620265435Z   lcdp-monolith-service 1104     at   play.api.mvc.Action.$anonfun$apply$6(Action.scala:83)
2024-06-13T07:44:28.620195227Z   lcdp-monolith-service 1104     at   play.core.j.JavaAction.apply(JavaAction.scala:185)
2024-06-13T07:44:28.620154717Z   lcdp-monolith-service 1104     at   scala.concurrent.Future$.apply(Future.scala:687)
2024-06-13T07:44:28.620079591Z   lcdp-monolith-service 1104     at   scala.concurrent.impl.Promise$DefaultPromise.map(Promise.scala:182)
2024-06-13T07:44:28.620000239Z   lcdp-monolith-service 1104     at   scala.concurrent.impl.Promise$DefaultPromise.dispatchOrAddCallbacks(Promise.scala:312)
2024-06-13T07:44:28.619930027Z   lcdp-monolith-service 1104     at   scala.concurrent.impl.Promise$DefaultPromise.submitWithValue(Promise.scala:338)
2024-06-13T07:44:28.619846804Z   lcdp-monolith-service 1104     at   scala.concurrent.impl.Promise$Transformation.submitWithValue(Promise.scala:429)
2024-06-13T07:44:28.61980294Z   lcdp-monolith-service 1104  at   play.core.j.ClassLoaderExecutionContext.execute(ClassLoaderExecutionContext.scala:59)
2024-06-13T07:44:28.61972737Z   lcdp-monolith-service 1104  at   play.api.libs.streams.Execution$trampoline$.execute(Execution.scala:65)
2024-06-13T07:44:28.619639958Z   lcdp-monolith-service 1104     at   datadog.trace.bootstrap.instrumentation.java.concurrent.RunnableWrapper.run(RunnableWrapper.java:24)
2024-06-13T07:44:28.619566666Z   lcdp-monolith-service 1104     at   play.core.j.ClassLoaderExecutionContext.$anonfun$execute$1(ClassLoaderExecutionContext.scala:64)
2024-06-13T07:44:28.619522766Z   lcdp-monolith-service 1104     at   scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
2024-06-13T07:44:28.619446586Z   lcdp-monolith-service 1104     at   scala.concurrent.Future$.$anonfun$apply$1(Future.scala:687)
2024-06-13T07:44:28.619367239Z   lcdp-monolith-service 1104     at   play.core.j.JavaAction.$anonfun$apply$8(JavaAction.scala:184)
2024-06-13T07:44:28.619280987Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.actions.AbstractDeadboltAction.call(AbstractDeadboltAction.java:121)
2024-06-13T07:44:28.619154779Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.actions.AbstractRestrictiveAction.execute(AbstractRestrictiveAction.java:58)
2024-06-13T07:44:28.619111184Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.CompletableFuture.thenCompose(Unknown Source)
2024-06-13T07:44:28.619035022Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.CompletableFuture.thenCompose(Unknown Source)
2024-06-13T07:44:28.618949111Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.CompletableFuture.uniComposeStage(Unknown   Source)
2024-06-13T07:44:28.618876854Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.actions.AbstractRestrictiveAction.lambda$execute$2(AbstractRestrictiveAction.java:59)
2024-06-13T07:44:28.618831936Z   lcdp-monolith-service 1104     at java.base/java.util.Optional.orElseGet(Unknown   Source)
2024-06-13T07:44:28.618735621Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.actions.AbstractRestrictiveAction.lambda$null$1(AbstractRestrictiveAction.java:59)
2024-06-13T07:44:28.618649053Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.actions.RestrictAction.applyRestriction(RestrictAction.java:73)
2024-06-13T07:44:28.618574181Z   lcdp-monolith-service 1104     at   be.objectify.deadbolt.java.ConstraintLogic.restrict(ConstraintLogic.java:129)
2024-06-13T07:44:28.618491829Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.CompletableFuture.thenCompose(Unknown Source)
2024-06-13T07:44:28.618394522Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.CompletableFuture.thenCompose(Unknown Source)
2024-06-13T07:44:28.618305128Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.CompletableFuture.uniComposeStage(Unknown   Source)
2024-06-13T07:44:28.618224563Z   lcdp-monolith-service 1104     at   java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown   Source)
2024-06-13T07:44:28.618125784Z   lcdp-monolith-service 1104 java.util.concurrent.CompletionException:   java.util.concurrent.RejectedExecutionException: Task   java.util.concurrent.FutureTask@529314e5[Not completed, task =   java.util.concurrent.Executors$RunnableAdapter@68f7cf05[Wrapped task =   io.ebeaninternal.server.executor.DefaultBackgroundExecutor$$Lambda/0x00007f01f501c360@1fa833d1]]   rejected from java.util.concurrent.ThreadPoolExecutor@27353d1c[Shutting down,   pool size = 1, active threads = 1, queued tasks = 0, completed tasks =   118450]

Yours faithfully, LCDP

rbygrave commented 2 months ago

So this is all about Graceful shutdown of an application.

In my terms Graceful shutdown needs to run in order of:

What this means wrt Ebean is:

When application receive SIGTERM, ThreadPoolExecutor used by Ebean to get futureCount should wait that all current HTTP request have been processed before shutting down.

Yes. IMO there is more details to it but yes the application must control the order in which resources are shutdown/closed. Good Dependency Injection libraries give you control over shutdown order / prioritise the order of execution of PreDestroy methods.

Are there any followup questions?

Edit: fwiw, in terms of "Quiesce waiting for active requests to complete" different http servers have different costs associated with doing this like maintaining an AtomicLong counter for active requests (so there is some concurrency cost associated with that). Helidon SE 4.x has a zero runtime cost way to do this so which is great imo - I contributed to get that because I really desire fast graceful shutdown, it's a high priority for me. Some other http servers (like Jetty) have extra add-on features that you need to enable to get this. Some http servers tend to just "wait some amount of time" which imo is not great if you want fast graceful shutdown (and I tend to want that for K8s - speed of shutdown translates directly to speed of blue/green deployment etc).

AntoineDuComptoirDesPharmacies commented 2 months ago

Hi @rbygrave,

Oh that is perfect ! I did not know that Ebean already provide this feature.

So we are going to create a custom Play Component that will call deregisterShutdownHook on startup and use ApplicationLifecycle method addStopHook calling ShutdownManager shutdown method.

No additional question on our side. Yours faithfully, LCDP