Closed Aaronontheweb closed 2 months ago
Marking this bug as critical - one of the major side effects from this issue is that we can create split brains with all cluster singletons during deployments when the AppVersion
is getting bumped. That can result in problems such as #6973
So this bug likely affected less people than I initially thought as
Has been set to false
this whole time and that's also the default value from the HOCON extractors when this configuration isn't available. That's good news, but it still needed to be fixed.
Looks like the original issue reported by the end user wasn't even caused by the AppVersion
, but this feature is definitely a footgun and probably needs to be removed.
Version Information Version of Akka.NET? v1.5.0 Which Akka.NET Modules? Akka.Cluster.Tools
Describe the bug
Chasing down and issue for a production support customer - they have a custom
pbm
command for being able to track the location of cluster singletons. They confirmed the singleton was on a specific node and decided to replace that one last during a version upgrade. What they observed was: the singleton moved onto the newest node with the highestAppVersion
even before that oldest node was downed!Expected behavior
As I wrote back to the customer originally, the singleton should only move onto a new node AFTER the node it's currently on begins to leave the cluster. This leads me to believe that the following code might have a bug in how we compute the sort order for who the most suitable location is for a singleton:
https://github.com/akkadotnet/akka.net/blob/3f0be58a661150c3d14572cd4615b526ba5e037a/src/contrib/cluster/Akka.Cluster.Tools/Singleton/OldestChangedBuffer.cs#L98-L112
In fact, I'm almost certain that this is the case.