Open alex023 opened 6 years ago
Nice catch. very tricky edge case. Maybe stopping should even have it's own recovery handling to prevent this.
Let's brainstorm around it
Both *Stopping
and *Stopped
have this problem. These life cycle messages are essential to the design of Proto.Actor so I would try to avoid deleting them. It seems clear that a stopping actor that fails should not be restarted and potentially prevent an entire tree of actors from stopping. I can think of two reasonable approaches:
The stopping actor should simply not escalate the failure; EscalateFailure
can return early if ctx.state >= stateStopping
. The defer
call in mailbox.go will still trigger an error message to be logged, so the failure is not completely invisible to the user.
The supervisor should not try to resume or restart an actor that has failed while stopping. This approach puts the responsibility to do the right thing in the SupervisionStrategy
, which is where it belongs. To allow the SupervisionStrategy
to make the right decision, EscalateFailure
could pass along the child's state
in the *Failure
message.
There are some major changes in the dev branchin for the C# API right now. We have even talked about redesigning how supervision works, but nothing final there yet.
So if we want to do some change to the Go API, we should try to align this right now.
Description:
If Actor appears panic when it responds to the
actor.Stopping
message, it goes into restarting and continues to survive. There are data anomalies, memory spillovers or other hidden dangers. Maybe we should only keep the*actor.Stopped
,and remove*actor.Stopping
from application layer.TestCode: