celery / ceps

Celery Enhancement Proposals
27 stars 15 forks source link

Jumpstarter #31

Open thedrow opened 3 years ago

thedrow commented 3 years ago

We need to write a design CEP for Jumpstarter.

MicahLyle commented 3 years ago

Disclaimer: These are some WIP thoughts as I'm growing in understanding of Jumpstarter, current Celery's internals, and plans for next-gen Celery and how all of those relate together, along with the actor framework. I may edit this as I go and/or add more responses in future comments, maybe answering my own questions or posing more. These are some things I'm trying to wrap my head around with respect to writing the CEP.

An actor repeatedly runs tasks to fulfill their purpose. Using tasks, the user implements the business logic of the Actor.
A task can be synchronous or asynchronous.
If the task is synchronous, the task is run in a thread pool. If it is asynchronous, the task runs using the event loop.

https://github.com/celery/jumpstarter/issues/1

^

Question: What is the relationship between the current idea of "Celery workers", new-style actors, and new-style tasks? Does a worker (an actor maybe?) essentially spawn an actor (part of the 3 actor axioms) and then pass messages to it (another part of the 3 actor axioms).

Question: Does an actor always have to acquire, for example, a connection to the broker? What if, for example, the settings are configured in a way that ignoring the results is True, and there's no need to talk to the broker (or the result store)? Also, with this, say the worker does need to acquire a connection to the broker. Is that something that should be considered a resource, or is that something that could be acquired after the result is retrieved (I.E. in a different substate transition). It seems like running the tasks and getting the result is a different sub-state than notifying result store and/or broker of task completion result, etc. and popping the task(s) from the queue(s), etc. Maybe transitions into and out of this sub-state can facilitate too things like fanout.

Idea: ResultDispatcher being a type of actor that simply dispatches results. The RabbitMQ, Redis, SQLAlchemy, and other result stores could simply inherit from this to acquire their necessary resources, etc.?

Question: So now, retry, with at least the await implementation with at least anyio.sleep, etc., seems like it will execute in the same actor now instead of being retried, for example somewhere else. Is that how current Celery behaves? What if the current actor is crashed? I'm presuming the supervisor will try and restart it, etc. and worst case at least the state persistence will be there so it can be resumed later?

Question: Something I really want to do is relate the three major actor axioms to how these new-style Celery actors, task(s), and worker(s) will work. Here's my current big-picture thought, etc.

Question: How does the next-generation Celery and this actor framework/model tie in with AMPQ and its original inspiration for Celery and how Celery works within the AMPQ framework? I'll need to do some more reading here I think, but I'm trying to understand what's going to be different about next-gen Celery and how that will fit in with this actor model? I'm also thinking about what Hewitt and the idea that the actor model "bypasses" the idea of channels, which come with CSPs and some other ideas. In the actor model, there are only addresses, and messages are sent directly to addresses, and not through some middleware/router. Does that conflict at all with AMPQ, and the idea of a "broker" that is there fanning out messages for example? Or is the broker considered an actor/actors in its own right?

thedrow commented 3 years ago

Question: What is the relationship between the current idea of "Celery workers", new-style actors, and new-style tasks? Does a worker (an actor maybe?) essentially spawn an actor (part of the 3 actor axioms) and then pass messages to it (another part of the 3 actor axioms).

A Celery worker is an Actor System. An actor system consists of a configuration store, a collection of children actors, and a root actor.

See more in everything is an actor.

Question: Does an actor always have to acquire, for example, a connection to the broker? What if, for example, the settings are configured in a way that ignoring the results is True, and there's no need to talk to the broker (or the result store)? Also, with this, say the worker does need to acquire a connection to the broker. Is that something that should be considered a resource, or is that something that could be acquired after the result is retrieved (I.E. in a different substate transition). It seems like running the tasks and getting the result is a different sub-state than notifying result store and/or broker of task completion result, etc. and popping the task(s) from the queue(s), etc. Maybe transitions into and out of this sub-state can facilitate too things like fanout.

Resources are currently bound to the lifecycle of the actor.

An actor does not have to hold a connection to a message broker. In fact, an actor only has to acquire & hold a cancel scope so that we can shut it down.

Idea: ResultDispatcher being a type of actor that simply dispatches results. The RabbitMQ, Redis, SQLAlchemy, and other result stores could simply inherit from this to acquire their necessary resources, etc.?

See Data Sources and Sinks. Essentially each source or sink is an actor.

Question: So now, retry, with at least the await implementation with at least anyio.sleep, etc., seems like it will execute in the same actor now instead of being retried, for example somewhere else. Is that how current Celery behaves? What if the current actor is crashed? I'm presuming the supervisor will try and restart it, etc. and worst case at least the state persistence will be there so it can be resumed later?

I understand your concern but at this moment this is an implementation detail that will be implemented only much further ahead.

Essentially, each task has a state machine. After a retry occurs, you can put the same message in the actor that is currently executing it or in another actor's inbox. As far as Jumpstarter goes, this will be implemented using callbacks which you'll register on the appropriate states.

Question: Something I really want to do is relate the three major actor axioms to how these new-style Celery actors, task(s), and worker(s) will work. Here's my current big-picture thought, etc.

* `celery worker ... (command line args)` will spin up a worker, which, at the top level, is really an actor (inheritng from `Actor`)? It will acquire resources, potentially have dependencies (?), and do all the other necessary spinup/jumpstart type things that need to be done. It then can (part of the axioms) create child actors, namely, `task`s, which it will do whenever it receives messages (from the broker). In this case, the actor axioms are fulfilled/satisfied. When the worker receives messages, it can send any finite number of messages to any of its finite number of task actors (I.E., for example, worker `M` (actor) sends message to task `M.ab35de` (actor) saying that you need to cancel whatever you're doing. That would be how you'd revoke a task.

Exactly.

* When a task finishes (whether success, retry, failure, etc.), does it and can it communicate its result back to the worker actor? Or does it create a child actor to then handle the dispersing and/or propagation of the result in the first place? Or both? This would all fit in the actor axioms still. Any child actor should know the address(es) of which to send messages to its parent(s?).

Good questions. I think we can answer them later when we design Celery.

Question: How does the next-generation Celery and this actor framework/model tie in with AMPQ and its original inspiration for Celery and how Celery works within the AMPQ framework? I'll need to do some more reading here I think, but I'm trying to understand what's going to be different about next-gen Celery and how that will fit in with this actor model? I'm also thinking about what Hewitt and the idea that the actor model "bypasses" the idea of channels, which come with CSPs and some other ideas. In the actor model, there are only addresses, and messages are sent directly to addresses, and not through some middleware/router. Does that conflict at all with AMPQ, and the idea of a "broker" that is there fanning out messages for example? Or is the broker considered an actor/actors in its own right?

Yes, you can treat a message broker as an actor of its own if the queue topology is one where each actor has its own inbox and it is just used to put messages in the actor's inbox and to persist those messages whether temporarily or permanently. Celery is going to have a ConsumerActor which will consume from all queues the user configured to consume from and it will pass those messages forward to other actors. In addition, nothing restricts you from using channels with actors. Hewitt just mentioned they are no longer a requirement.

MicahLyle commented 3 years ago

Actor Systems: This was really helpful and tied together a number of things I wasn't fully grasping. I think one of the bigger blockers so far for me in really feeling like I had a firm mental grasp of this (and how it can work for jumpstarter and next-gen celery) is trying to figure out what an individual actor is. Is it a thread? An object? A process? A task (or task class, task instance, etc.)? Is it a green thread? A spawned task or task group? That article and the linked video from React Finland I think help point out that what the actor is (and how a message is defined, and what messages look like), are very flexible and it's ultimately up to the system designers. It's just key that ultimately, the axioms are satisfied, which they're not super restrictive the more I think about it.

Really then, what Akka

^ From that Actor Systems link you posted:

If one actor depends on another actor for carrying out its duty, it should watch that other actor’s liveness and act upon receiving a termination notice.

So in this regard, for child actors (depending on something else), could communicate "messages" back to their parent if/when certain state transitions happen by hooking into those transitions? Or, how does the idea of a parent "watching" another actor's liveness and acting upon it fit into the idea of the actor system and actors? Because in certain scenarios, wouldn't this require, say, a polling type mechanism (I.E. check if the process hasn't crashed, maybe in the idea of spawning a subprocess with a command type scenario, being theoretical here)? Maybe I'm being overly literal here, but can an actor, in response to a message it receives, do an event loop/polling type thing indefinitely? The actor axioms do say they have to send a finite number of messages, so is this polling type behavior allowed (as an example)? Let me know if I'm not making sense here.

^ Related to the above, kind of thinking out loud here, with trio, the scheduler/kernel (which, if we're trying to pretend this is an actor system, that would be the guardian/root actor) is aware of all the nested cancel scopes, etc. and has the ability to cancel children (which in our actor system, we could call sending a "cancel" message), that's ultimately sending a message (I.E. raising an exception in the generator frame), which seems to satisfy all the axioms of the actor model (with the exception here being a message, albeit a little tricker to grasp but I think that's a message nonetheless). In the same way, raising an exception is the idea of communicating a message back to the parent, and the parent (whether implicitly or not) can choose to re-raise the exception which is sending a message to its parent, etc. (the message in this case being I'm done). So in the case of trio, and exceptions as messages, the idea of raising an exception is a reliable message which we know won't be dropped, etc., so children can effectively communicate back to parents without the parents having to "poll" so to speak.