akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.72k stars 1.04k forks source link

Akka.Cluster.Sharding: support "push" mode for communicating with `ShardedDaemonProcess`s #7195

Closed Aaronontheweb closed 5 months ago

Aaronontheweb commented 5 months ago

Is your feature request related to a problem? Please describe.

Related to https://github.com/akkadotnet/Akka.Hosting/issues/453, specifically this line I wrote:

One thing that occurs to me now is that it might be the the ShardedDaemonProcess was always designed for work-pulling only, which is another problem we'd have to deal with in the main Akka.NET repo itself.

When I went looking to fix that issue, lo and behold this original implementation comment from @ismaelhamed

https://github.com/akkadotnet/akka.net/blob/3f0be58a661150c3d14572cd4615b526ba5e037a/src/contrib/cluster/Akka.Cluster.Sharding/ShardedDaemonProcess.cs#L79-L92

So the ShardedDaemonProcess was originally designed for work-pulling, which makes sense: using it for projections et al.

However, I think without too much headache we can retool the ShardedDaemonProcess to also support work-pushing. In a universe where remote deployment of actors no longer exists (something that has been discussed, from time to time) and you want to have a fixed number of processors distributed evenly throughout the cluster for running things like ETL jobs, ShardedDaemonProcesses are perfect for that.

Describe the solution you'd like

I'd like to change the signature of ShardedDaemonProcess.Init to return an IActorRef that points to a lightweight router that sits on top of the ShardRegion IActorRef that is used, internally, to power the ShardedDaemonProcess. This router can use one of the existing Akka.NET routing strategies but by default we'll use RoundRobin and distribute work evenly across all of the sharded daemon process instances.

Describe alternatives you've considered

Alternatively we could just use a remote Pool router for this work, but that has two big disadvantages:

  1. Requires remote deployment and all of the serialization baggage that comes with it
  2. You can't control the number of instances "globally" for the pool without limiting yourself to a single router - each node that creates the router adds more capacity to the pool.

Additional context

I think the KeepAlivePinger, part of the ShardedDaemonProcess, provides some basis for how we could go about adding a public-facing routing capability that feeds into the HashCodeMessageExtractor for communicating with the worker processes directly:

https://github.com/akkadotnet/akka.net/blob/3f0be58a661150c3d14572cd4615b526ba5e037a/src/contrib/cluster/Akka.Cluster.Sharding/ShardedDaemonProcess.cs#L17-L63

to11mtm commented 5 months ago

My main question:

Not sure how we are thinking of the 'push' here, but a big question would be how we make sure that we do some reasonable-ish effort for cases like rebalances/rolling deploys/etc?

Will admit I'm not familiar with the Sharded Daemon bits so could be worrying about nothing.

Aaronontheweb commented 5 months ago

This is a pretty thin layer on top of a ShardRegion, so all of those some delivery and availability mechanisms are at play. The only difference is we don't have an easy way of actively communicating with a daemon process out of the box today