asynkron / protoactor-go

Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin
http://proto.actor
Apache License 2.0
5.05k stars 523 forks source link

Terminated messages being processed while actor is restarting #299

Open melaurent opened 5 years ago

melaurent commented 5 years ago

In my application, I need the parent actor to be notified when one of its children fail. I thus have to stop the children as a supervisor strategy, because when children are restarted, the parent doesn't get notified. So, I let the children die, while listening for termination events. The problem occurs when the parent fails and restarts. The children will be killed, and the parent will receive and process the terminated events for each children, even though its state is Restarting. Now I don't know if it is on purpose, because I understand Terminated is kind of a system message, but it does seem off

potterdai commented 5 years ago

@melaurent Hi, really sorry for my late reply. I was very busy with other stuffs. I think it's currently by design that if a parent fail and restart, all children will be stopped in this case. It's worth discussing though, as mentioned in the code:

https://github.com/AsynkronIT/protoactor-go/blob/dev/actor/actor_context.go#L566-L568

melaurent commented 5 years ago

No worries ! I am sorry my message must not have been clear enough. I am not discussing the design decision of restarting children of the restarting actor. The problem occurs when the parent actor receives a *Terminated event for its killed children while its state is still corrupted (because restarting). Here is an example showing the behavior. I hope it is clear enough !

package main

import (
    "github.com/AsynkronIT/protoactor-go/actor"
    "fmt"
    "github.com/AsynkronIT/goconsole"
)

// Bug description:
// When an actor state is corrupted and it has to be restarted, its children will be terminated and
// the restarting actor will receive a *Terminated message for each children. However, even though the state
// of the actor is corrupted, and the restart hasn't ended yet, the *Terminated message will be processed.

type ChildActor struct {}
func (state *ChildActor) Receive(context actor.Context) {
    switch context.Message().(type) {
    case *actor.Started:
        fmt.Println("Child actor started")
    }
}

type ParentActor struct {
    restarting bool
}

type KillParent struct {}

func (state *ParentActor) Receive(context actor.Context) {
    switch context.Message().(type) {
    case *actor.Started:
        fmt.Println("Parent actor started")
        fmt.Println("Spawning one child")
        childProps := actor.PropsFromProducer(func() actor.Actor { return &ChildActor{} })
        context.Spawn(childProps)

    case *actor.Stopping:
        fmt.Println("Parent actor stopping")

    case *actor.Stopped:
        fmt.Println("Parent actor stopped")

    case *actor.Restarting:
        fmt.Println("Parent actor restarting..")
        state.restarting = true

    case *actor.Terminated:
        if state.restarting {
            fmt.Println("Received a child terminated event while restarting is not finished")
        }

    case *KillParent:
        panic("Argh..")

    }
}

func main() {
    context := actor.EmptyRootContext
    parentProps := actor.PropsFromProducer(func() actor.Actor { return &ParentActor{false} })
    pid := context.Spawn(parentProps)

    // Kill the parent
    context.Send(pid, &KillParent{})
    console.ReadLine()
}

Output:

Parent actor started
2019/04/03 09:10:18 [MAILBOX] [ACTOR] Recovering actor="nonhost/$1" reason="Argh.." stacktrace="main.(*ParentActor).Receive:54" 
Spawning one child
Child actor started
2019/04/03 09:10:18 [ACTOR] [SUPERVISION] actor="nonhost/$1" directive="RestartDirective" reason="Argh.." 
Parent actor restarting..
Received a child terminated event while restarting is not finished
Parent actor started
Spawning one child
Child actor started