[BUG] Actor still process incomming message after another one has thrown

sezaru commented 2 years ago

If an actor throws when processing a message, it means that after that point, the actor's state is considered invalid.

Because of that, what normally happens is that the exception will propagate down to a supervisor which will decide how to handle the exception (restart, etc).

Also, that actor should not be able to process new incoming messages until the restart process is done, otherwise you are processing messages with an invalid state.

In my testings, Theater actor will still process incoming messages after throwing, here is an example code:

class SupervisorDecider extends Decider {
  @override
  Directive decide(Object object) {
    return Directive.restart;
  }
}

class RootSupervisor extends UntypedActor {
  @override
  Future<void> onStart(UntypedActorContext context) async {
    await context.actorOf('test_actor', TestActor());
  }

  @override
  SupervisorStrategy createSupervisorStrategy() => OneForOneStrategy(
        decider: SupervisorDecider(),
        restartDelay: Duration(milliseconds: 500),
      );
}

class TestActor extends UntypedActor {
  int state = 0;

  @override
  void onStart(UntypedActorContext context) {
    print('starting ${context.path}');

    context.receive<int>((message) {
      print('got int message: ${message}, current_state: ${state}');

      state = message;

      if (message == 2) {
        print('throwing!');
        throw Exception('Got 2');
      }

      return;
    });
  }
}

void main() {
  var system = ActorSystem('test_system');

  await system.initialize();

  await system.actorOf('root_supervisor', RootSupervisor());

  final ref = system.getLocalActorRef('../root_supervisor/test_actor');

  ref?.send(1);
  ref?.send(2);
  ref?.send(3);
}

So, as you can see in the code, if the user sends a message with the integer 2, it will throw.

What I expect from this code is that it will start, print message 1 with state 0, print message 2 with state 1, throw, restart, print message 3 with state 0, like this:

I/flutter (17554): starting test_system/root/user/root_supervisor/test_actor
I/flutter (17554): got int message: 1, current_state: 0
I/flutter (17554): got int message: 2, current_state: 1
I/flutter (17554): throwing!
I/flutter (17554): starting test_system/root/user/root_supervisor/test_actor
I/flutter (17554): got int message: 3, current_state: 0

Instead, what I got was:

I/flutter (17554): starting test_system/root/user/root_supervisor/test_actor
I/flutter (17554): got int message: 1, current_state: 0
I/flutter (17554): got int message: 2, current_state: 1
I/flutter (17554): throwing!
I/flutter (17554): got int message: 3, current_state: 2
I/flutter (17554): starting test_system/root/user/root_supervisor/test_actor

Which is totally wrong, since message 3 was processed with the invalid state 2.

GlebBatykov commented 2 years ago

Hi!

This is precisely because of the asynchronous nature of message processing.

An error has occurred in the actor.

He sent an error message to his supervisor, asynchronous.

But while the supervisor has not processed it yet, has not suspended the work of the child actor (while deciding how to handle the error in it), the child actor also receives an asynchronous message and executes it.

It turns out that the actor, after an error has occurred in it, still manages to process the message.

I will think about a solution to this problem.

GlebBatykov commented 2 years ago

Or else such a problem could arise for this reason. The actor could have already processed several asynchronous tasks at the moment when the error occurred. They have already been placed in the Event Loop of this actor's isolate.

An error has occurred. But while the supervisor has not processed the error yet, the tasks in the actor's Event Loop continue to be executed.

sezaru commented 2 years ago

I guess having the option to process the messages sequentially would solve this issue too then right?

GlebBatykov commented 2 years ago

There is another interesting point. And I am interested in the opinion of other people on this score (there were almost no interested people who would give me feedback about my library).

At the moment, even a confirmation mailbox only guarantees that the message is delivered to the actor.

That is, you assign a handler for some type of message.

The message arrives at the actor, and from the moment your handler is launched, the mailbox considers that the message has been delivered and the mailbox will send the next message.

It seems to me that this in some way devalues the work of a reliable mailbox.

After all, as you might have noticed in the event of an error, you risk losing some kind of message. If several messages were processed at once, an error occurred in one of them. And for example, the supervisor of the actor decided to restart it. In this case, other messages that have already been processed but have not had time to be processed will be lost. They will not be in the mailbox and the actor isolate will be restarted, the handlers will not be executed.

I've been thinking about it a lot. I was considering options for returning messages to the mailbox, in this case.

But as a result, I decided to add the ability to process messages sequentially.

It just seems to me that because of the features of Dart, I would not like to always process them sequentially, deprive myself of the opportunity to process them asynchronously.

But there are a lot of questions about asynchronous processing.

GlebBatykov commented 2 years ago

Yes, I think so too

sezaru commented 2 years ago

there were almost no interested people who would give me feedback about my library

My guess as to why this is the case is simply because most programmers don't even know that the actor model exists or what it solves. You can see that in other languages too when people are discussing how to solve concurrency problems when Erlang already solved that almost 30 years ago with the actor model hahaha.

I will give you my feedback from the viewpoint of an Elixir/Erlang developer, not Akka, but since Akka is based on Erlang's OTP anyway, they should be pretty similar.

For me, the biggest value an Actor model brings to the table is being able to have a sane concurrency and fault-tolerant code.

This means that I can have an actor that holds a state and I can manipulate that state without having to worry about locks, race conditions, etc, since they are not shared with other actors, they are only accessed sequentially and they are fault-tolerant (via supervisors).

So, in my view, an actor that doesn't process messages sequentially loses most of these properties and is not that different than running a bunch of async tasks.

Having that in mind, in Erlang, when you call an actor, you define a timeout if the call expects a reply (see: https://hexdocs.pm/elixir/1.12/GenServer.html#call/3).

This means that the caller blocks until a result is received (in your library, sendAndSubscribe would block, or at least return a promise).

Also, that blocked call can fail if the actor crashes, or if the timeout is reached.

Now, regarding the mailbox, if the caller crashes, in Erlang's OTP, the mailbox is lost, and the callers waiting for a response (via sendAndSubscribe) will receive an error.

It seems like you want to somehow keep the mailbox so no message is lost, but IMO that would add a lot of complexity to your code and will be hard to guarantee anyway.

An easier approach is simply to handle that on the caller's side, so they call the actor, and if the call fails because the actor crashed or timed out, the caller can handle that and retry later when the supervisor restarts the crashed actor with a fresh state.

Not sure if that answers your questions or if it is good feedback. But feel free to ask for more clarifications if needed.

GlebBatykov commented 2 years ago

I recently studied Elixir/Erlang, looked at working with processes there, remote interaction. And I was impressed by this technology.

GlebBatykov commented 2 years ago

There were thoughts to write a console utility in order to remotely connect to the actors system of my library and be able to look at the state of the actors system, manage them.

sezaru commented 2 years ago

That would be amazing, I guess you mean something similar to the observer in erlang. In case you are not aware, one of the things that the observer allows you to do is exactly that, you can see your process tree, kill a process, see that process information, state, etc.

GlebBatykov commented 2 years ago

I am not familiar with observer, although I heard about it when I tried to study Elixir/Erlang. But yes, I was thinking about creating a utility with similar functionality.

GlebBatykov / theater

[BUG] Actor still process incomming message after another one has thrown #5