Closed fscavo closed 9 months ago
You might need to run https://github.com/petabridge/Akka.Cluster.Sharding.RepairTool here. This is likely caused by an interrupted shutdown of the Akka.Cluster.Sharding coordinator.
Is this a bug or is there something we might not manage properly? @Aaronontheweb
It's the latter - need to allow clean shutdowns of your cluster nodes to avoid this problem. Or you can change your state storage mode to DData, which doesn't have this issue.
That being said, we ought to make this a better experience for users
@Aaronontheweb What might help is guidelines for coordinated-shutdown timeouts as well as sharding timeouts in docs.
I've found that with Sharding (especially with remember-entities=on
), the closer you are to running 'max load' for a cluster, the longer it takes to do a migration on shutdown. i.e. if you are moving hundreds or thousands of actors and dozens of shards across multiple types of sharded actors, it may not hurt to shoot for a total coordinated shutdown timeout of a minute or more as well as longer coordinator timeouts on sharding. Don't forget to consider load shifts -during- a deploy. e.x. If you're rolling across 4 nodes, how likely is that Node 1 has had anything moved to it before you start shutting Node 2 down? (not very!)
I do know that with the Persistence.Linq2Db plugin, shutdown performance is overall improved (especially if multiple shards are shutting down.) Definitely better in most 'overload' scenarios (i.e. when you ignore the advice that 10 shards per node is a good max)
I think the changes we introduced in Akka.NET v1.5 probably resolve this.
Version of Akka.NET? 1.4.26 Which Akka.NET Modules? Akka.Cluster Akka.Cluster.Sharding
Akka.Cluster.Sharding.PersistentShardCoordinator reports this error:
and throws the following exception: