asynkron / protoactor-kotlin

Ultra-fast distributed cross-platform actor framework
http://proto.actor
Apache License 2.0
221 stars 25 forks source link

Actor can receive message before mailbox dispatcher is initialized #29

Open james-cobb opened 6 years ago

james-cobb commented 6 years ago

Still need to investigate the precise circumstances that cause this, but we have an actor that sends a message to a newly instantiated actor using send(pid, msg). Sending throws an exception because the recipient mailbox lateinit dispatcher has not yet been initialized.

For a local actor it's easy to catch the exception and retry the message later. But if a message is sent to a remote actor with an uninitialized dispatcher, the exception occurs at the recipient end within EndpointReader's StreamObserver receive method, which then causes a gRPC UNKNOWN exception on the sending end, which causes the connection to die.

From the DefaultMailbox it looks as though a spawn call can only return a pid once handlers are registered, so it's hard to see how this is possible.

For the moment I've checked initialization here: https://github.com/AsynkronIT/protoactor-kotlin/blob/ca94fb9a2f18e5738c0d01d8f9749caee29293b1/proto-mailbox/src/main/kotlin/actor/proto/mailbox/DefaultMailbox.kt#L81

using

 if (wasIdle && ::dispatcher.isInitialized) {

(and actually also at this line, checking twice just to introduce some more logging) https://github.com/AsynkronIT/protoactor-kotlin/blob/ca94fb9a2f18e5738c0d01d8f9749caee29293b1/proto-mailbox/src/main/kotlin/actor/proto/mailbox/DefaultMailbox.kt#L18

And that has at least stopped the exceptions. But there will be a better solution if I can find exactly what case allows a message to be sent to an actor that hasn't yet had handlers registered.

james-cobb commented 6 years ago

The default spawner does things in this order:

    val mailbox = props.mailboxProducer()
    val dispatcher = props.dispatcher
    val process = LocalProcess(mailbox)
    val self = ProcessRegistry.put(name, process)
    val ctx = ActorContext(props.producer!!, self, props.supervisorStrategy, props.receiveMiddleware, 
    props.senderMiddleware, parent)
    mailbox.registerHandlers(ctx, dispatcher)
    mailbox.postSystemMessage(Started)
    mailbox.start()
    return self

So if an actor is spawnNamed - the actor is stopped, then a new actor is quickly started with the same name, the new actor would be in the Process registry and could have a message sent to it, before the handlers are registered, or the Started message is sent?

Would it work to add the process to the Registry last, after the handlers are registered, and the Started message is sent (and start() is called to start the mailbox stats)? Then if a message is sent to old_actor_name after its termination, and during startup of a new actor with the same name, a dead letter event will occur unless the new actor has completed starting properly.

While looking at this - what happens if an exception is thrown during processing of the Started method? That will cause a restart, but could other messages be received also in between the Started, Restarting, Started cycle?

james-cobb commented 6 years ago

We have seen two causes of this. using spawnNamed as described above, and also actors that throw an exception when processing Started message. This can also cause a rapid succession of actors stopped and respawned with the same name, with the same result.

In the first case, if a message is sent after intentionally stopping a named actor and before restarting it, a Dead Letter Event would be logical.

In the second case, a logical answer would be not to process any other messages between Restarting and Started, which is currently not the case - see issue #31

rogeralsing commented 6 years ago

In the first case, if a message is sent after intentionally stopping a named actor and before restarting it, a Dead Letter Event would be logical.

yes, dead actors should be purging to deadletter their own mailbox on shutdown, then then any send to its PID should also deadletter everything after

rogeralsing commented 6 years ago

In the second case, a logical answer would be not to process any other messages between Restarting and Started, which is currently not the case - see issue #31

I found the issue for this, there was a missing SuspendMailbox for root actors. Exception was thrown and picked up, escalated without suspending, thus letting messages in.

rogeralsing commented 6 years ago

The case where a message might go through before dispatcher is set sounds odd. Because in order for the actor to schedule and start processing, it needs the dispatcher.

So yes, a message can be posted on the mailbox. But that should throw due to null ref if we end up in that edgecase.

Still the wrong behavior though