Open james-cobb opened 6 years ago
The default spawner does things in this order:
val mailbox = props.mailboxProducer()
val dispatcher = props.dispatcher
val process = LocalProcess(mailbox)
val self = ProcessRegistry.put(name, process)
val ctx = ActorContext(props.producer!!, self, props.supervisorStrategy, props.receiveMiddleware,
props.senderMiddleware, parent)
mailbox.registerHandlers(ctx, dispatcher)
mailbox.postSystemMessage(Started)
mailbox.start()
return self
So if an actor is spawnNamed - the actor is stopped, then a new actor is quickly started with the same name, the new actor would be in the Process registry and could have a message sent to it, before the handlers are registered, or the Started message is sent?
Would it work to add the process to the Registry last, after the handlers are registered, and the Started message is sent (and start() is called to start the mailbox stats)? Then if a message is sent to old_actor_name after its termination, and during startup of a new actor with the same name, a dead letter event will occur unless the new actor has completed starting properly.
While looking at this - what happens if an exception is thrown during processing of the Started method? That will cause a restart, but could other messages be received also in between the Started, Restarting, Started cycle?
We have seen two causes of this. using spawnNamed as described above, and also actors that throw an exception when processing Started message. This can also cause a rapid succession of actors stopped and respawned with the same name, with the same result.
In the first case, if a message is sent after intentionally stopping a named actor and before restarting it, a Dead Letter Event would be logical.
In the second case, a logical answer would be not to process any other messages between Restarting and Started, which is currently not the case - see issue #31
In the first case, if a message is sent after intentionally stopping a named actor and before restarting it, a Dead Letter Event would be logical.
yes, dead actors should be purging to deadletter their own mailbox on shutdown, then then any send to its PID should also deadletter everything after
In the second case, a logical answer would be not to process any other messages between Restarting and Started, which is currently not the case - see issue #31
I found the issue for this, there was a missing SuspendMailbox for root actors. Exception was thrown and picked up, escalated without suspending, thus letting messages in.
The case where a message might go through before dispatcher is set sounds odd. Because in order for the actor to schedule and start processing, it needs the dispatcher.
So yes, a message can be posted on the mailbox. But that should throw due to null ref if we end up in that edgecase.
Still the wrong behavior though
Still need to investigate the precise circumstances that cause this, but we have an actor that sends a message to a newly instantiated actor using send(pid, msg). Sending throws an exception because the recipient mailbox lateinit dispatcher has not yet been initialized.
For a local actor it's easy to catch the exception and retry the message later. But if a message is sent to a remote actor with an uninitialized dispatcher, the exception occurs at the recipient end within EndpointReader's StreamObserver receive method, which then causes a gRPC UNKNOWN exception on the sending end, which causes the connection to die.
From the DefaultMailbox it looks as though a spawn call can only return a pid once handlers are registered, so it's hard to see how this is possible.
For the moment I've checked initialization here: https://github.com/AsynkronIT/protoactor-kotlin/blob/ca94fb9a2f18e5738c0d01d8f9749caee29293b1/proto-mailbox/src/main/kotlin/actor/proto/mailbox/DefaultMailbox.kt#L81
using
(and actually also at this line, checking twice just to introduce some more logging) https://github.com/AsynkronIT/protoactor-kotlin/blob/ca94fb9a2f18e5738c0d01d8f9749caee29293b1/proto-mailbox/src/main/kotlin/actor/proto/mailbox/DefaultMailbox.kt#L18
And that has at least stopped the exceptions. But there will be a better solution if I can find exactly what case allows a message to be sent to an actor that hasn't yet had handlers registered.