Closed Roiocam closed 4 days ago
From the perspective of stack frame, the problem lies in the handling:
[ERROR] [07/03/2024 09:46:12.113] [SupervisorSpec-pekko.actor.default-dispatcher-5] [pekko://SupervisorSpec/user/$a/$a] changing Recreate into Create after org.apache.pekko.actor.ActorInitializationException: pekko://SupervisorSpec/user/$a/$a: exception during creation, root cause message: [deliberate test failure]
[ERROR] [07/03/2024 09:46:12.114] [SupervisorSpec-pekko.actor.default-dispatcher-6] [pekko://SupervisorSpec/user/$a/$a] assertion failed
java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:264)
at org.apache.pekko.actor.dungeon.FaultHandling.faultCreate(FaultHandling.scala:160)
at org.apache.pekko.actor.dungeon.FaultHandling.faultCreate$(FaultHandling.scala:158)
at org.apache.pekko.actor.ActorCell.faultCreate(ActorCell.scala:420)
at org.apache.pekko.actor.dungeon.FaultHandling.faultRecreate(FaultHandling.scala:96)
at org.apache.pekko.actor.dungeon.FaultHandling.faultRecreate$(FaultHandling.scala:92)
at org.apache.pekko.actor.ActorCell.faultRecreate(ActorCell.scala:420)
at org.apache.pekko.actor.ActorCell.invokeAll$1(ActorCell.scala:526)
at org.apache.pekko.actor.ActorCell.systemInvoke(ActorCell.scala:545)
at org.apache.pekko.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:297)
at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:232)
at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:245)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
I think we can attach a test case for this, will look into this over the weekend.
@sadekmunawar Hi, my unit test partly refers to your code in #1383 links.
Would you like to contribute this to the Pekko community?
Would you like to contribute this to the Pekko community?
@pjfanning Am I doing this correctly, may I ask?
Would you like to contribute this to the Pekko community?
@pjfanning Am I doing this correctly, may I ask?
The test code was written by https://discuss.lightbend.com/u/nodeninja/summary
@sadekmunawar can you confirm if that is you? We received your iCLA.
The test code was written by https://discuss.lightbend.com/u/nodeninja/summary
@sadekmunawar can you confirm if that is you? We received your iCLA.
@pjfanning yes, that's me. After finding the bug, I wrote that test for a solution I implemented. My solution converts the FatallyFailed
into a case class and attaches an Optional ActorRef to it.
I also wrote a scala version of this test in ActorLifCyleSpec
. Feel free to use either version of the test
"must not break supervisor strategy due to unhandled exception in preStart" in {
val id = newUuid.toString
val gen = new AtomicInteger(0)
val maxRetryNum = 3
val childProps = Props(new LifeCycleTestActor(testActor, id, gen) {
override def preStart(): Unit = {
report("preStart")
throw new Exception("test exception")
}
}).withDeploy(Deploy.local)
val supervisorStrategy: SupervisorStrategy =
OneForOneStrategy(maxNrOfRetries = maxRetryNum, timeout.duration) {
case _: ActorInitializationException => SupervisorStrategy.Restart
case _ => SupervisorStrategy.Escalate
}
val supervisor = system.actorOf(Props(classOf[Supervisor], supervisorStrategy))
Await.result((supervisor ? childProps).mapTo[ActorRef], timeout.duration)
(0 to maxRetryNum).foreach(i => {
expectMsg(("preStart", id, i))
})
expectNoMessage()
system.stop(supervisor)
}
lgtm. Although I have read the relevant code, I did not delve into testing and debugging
I also wrote a scala version of this test in
ActorLifCyleSpec
. Feel free to use either version of the test
Thanks, I have updated this PR and verified it works in locally
@Roiocam should we backport this to this 1.0.x branch?
@Roiocam should we backport this to this 1.0.x branch?
Of course, this is a bug. backport on #1396
It looks like the checks did not run the tests in SupervisorHierarchySpec
.
When I ran the tests locally, two tests in SupervisorHierarchySpec
failed.
handle failure in creation when supervision strategy returns Resume and Restart
survive being stressed
Can someone check if these two unit tests pass?
It looks like the checks did not run the tests in
SupervisorHierarchySpec
.When I ran the tests locally, two tests in
SupervisorHierarchySpec
failed.
handle failure in creation when supervision strategy returns Resume and Restart
survive being stressed
Can someone check if these two unit tests pass?
Those tests are tagged as long-running tests. Maybe such tests are not run by default.
Did those tests fail before this PR was added?
Did those tests fail before this PR was added?
No, they passed in my local workspace before this PR was added. Unless there's something wrong with my local setup, this PR is the likely cause of the failures. I ran into issues with these two unit tests when experimenting with my own fix.
I will investigate it. :)
both survive being stressed
and handle failure in creation when supervision strategy returns Resume and Restart
has passes in my local by using #1399
Related with #1384 and resolves #1384, some of code contributed by @sadekmunawar
if PR can reproducer this issue, the ci checks will be unable to successd: https://github.com/apache/pekko/actions/runs/9770536752/job/26971887726?pr=1385