Open patriknw opened 10 years ago
Hi Patrik. I am aware of this issue, and for this reason in my use case I always start a new cluster one node at a time. However I think the suggestion of ordering the seed nodes is a good one, so I'll incorporate that and update the blog post when I get a chance (I'm on holiday, no laptop at the moment). I'll credit you and link to your Twitter account if that's okay?
In regards to delayed propagation of instance data, I think it's unlikely to be an issue. More likely is that since EC2 will consider an instance "Running" once the OS starts to boot, a node which has not yet started fully will be used as a seed node - in which case it should retry until it becomes available.
Many thanks for the input!
Actually I've just noticed the Readme says starting multiple nodes runs the risk of a split cluster. I'll update that too.
sounds good, thanks
We start our AutoScalingGroup from a CloudFormation Stack. So for our purposes, we are doing something similar to this but also first querying the stack status ("describeStacks") until we get the right status back, like "CREATE_COMPLETE" or a similar status. Then we can be relatively assured that the ASG is stable and that all instances will be getting the same list of IPs.
Glad to see such great blog post and sharing this useful information.
This will work fine in most cases, but there is one potential glitch. When starting up several nodes at the same time from scratch there is a chance that they will not join each other.
For example, you don't have a running cluster, and you start 3 nodes at the same time. I guess that the order of the siblings host names are undefined, so each node might place its own address first in the seed-nodes list. If they do this at the same time there is a chance that they will only join themselves.
It can also be the other way around, that no node places itself first in the seed-nodes list, and then then the cluster will not start at all.
Sorting the host names would improve the situation, but it is still not completely safe, since the different nodes might not see the other nodes as running if they do the EC2 call at the same time (I guess it takes some time for EC2 to propagate that information).
Sorting the host names might be good enough for practical purposes. To make it completely safe you have to treat one node special and only include that as the first seed-node, other nodes should not include themselves in the seed-nodes list.
This is only a problem when starting up a fresh cluster. It is documented here: http://doc.akka.io/docs/akka/2.3.5/scala/cluster-usage.html#Joining_to_Seed_Nodes