akka / akka

Build highly concurrent, distributed, and resilient message-driven applications on the JVM
https://akka.io
Other
13.03k stars 3.59k forks source link

akka.Dispatcher - Promise already completed #31378

Open rrwright opened 2 years ago

rrwright commented 2 years ago

While stress-testing a multi-JVM Akka (Cluster) application using Akka v. 2.6.19, the following ERROR level stack trace is printed without any apparent cause, and no particular effect that we noticed:

2022-05-02 10:25:54,831 ERROR [akka.dispatch.Dispatcher] [test-cluster-akka.actor.default-dispatcher-17] akka.dispatch.Dispatcher - Promise already completed.
java.lang.IllegalStateException: Promise already completed.
    at scala.concurrent.Promise.complete(Promise.scala:53)
    at scala.concurrent.Promise.complete$(Promise.scala:52)
    at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
    at scala.concurrent.Promise.failure(Promise.scala:104)
    at scala.concurrent.Promise.failure$(Promise.scala:104)
    at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:187)
    at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:45)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
    at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)

The stack trace does not reference any of our application code; so we suspect it is a bug in Akka's dispatcher.

With apologies, this bug report contains no reproduction steps. But this is the scenario being tested where this problem occurred:

A 5-JVM akka cluster was being run all on one machine (an Apple M1 Mac running Java 18). Every 20-50 seconds, a script was killing (kill -9) one of the JVM processes and restarting it. This ran as expected for several thousand cycles before producing this stack trace above. The test continued to kill processes and no other error conditions were observed (but none were being directly tested either).

This was meant to test our application logic for facilitating cluster members rejoining the akka cluster. The stack trace appeared one second after a MemberRemoved event was handled by the cluster singleton running on this machine (this was the immediately previous log statement), and about 8 seconds before the next event, MemberJoined (this was the immediate next log statement). I do not know if this is related to Akka Cluster or not, but the stack trace seems to suggest the Akka Dispatcher…?

As a minor suggestion, this looks like a common case of using promise.failure which just throws the exception instead of promise.tryFailure which would require handling the exception seen above. —then something in a complex protocol in the whole system which unexpectedly completes the Promise twice.

patriknw commented 2 years ago

Thanks for reporting. The error is indeed an indication of some bug. Question is what Promise it tries to complete.

It's on the default-dispatcher thread which is an indication that it's not in the very core internals of Akka, since those run on the internal dispatcher.

johanandren commented 2 years ago

I wonder if this could actually be an issue with the Scala 2.12 promise implementation, can you try and see if you can repeat the issue it with Scala 2.13? (The promise.failure(t) is called in Promise.transformWith in 2.12 while the promise impl has changed quite a bit in 2.13)