Open sfc-gh-mpilman opened 4 years ago
Are you saying that we should change three_data_hall
to have this property, or that we should have a different configuration that does this?
I think it would make sense to introduce a three_availability_zone
mode as this will make it clear that this is what people probably want to run if they run in the cloud. Whether this is just a rename of three_data_hall
plus the described feature or just a new policy - I don't know. I would assume data hall failures are more common than AZ failures - so it probably would make sense to support both behaviors (though I am not sure - I also don't know whether anyone uses three_data_hall
in production in actual data halls...)
It is not quite clear to me how we can achieve this, but I think it would involve something like this:
- During recruitment try to recruit tlogs in all availability zones for
X
seconds (X
would be a configuration parameter)- If recruiting in three AZs is impossible, recruit only in two, but set failure tolerance to a different value.
- Don't recover at all if only one AZ is available.
This will also affect how SSs are grouped to teams (replicas of the same data).
If we ask database operator to run FDB on more than three, instead of exactly three, availability zones for three_datahall/availiability mode, we can allow recovery and FDB will run in a more stable configuration/state. For a large deployment that have many FDB clusters, what is the extra cost of having more availability zones for the same size cluster, comparing to three AZs?
If we ask database operator to run FDB on more than three, instead of exactly three, availability zones for three_datahall/availiability mode, we can allow recovery and FDB will run in a more stable configuration/state.
Sadly this is not always possible. Only some regions have more than three availability zones - but if you want to run in eu-west-1
you don't have a choice (and having a replica in another region will come with serious performance implications).
If
three_data_hall
is used across three availability zones, we currently guarantee that FDB can survive the failure of one AZ and one machine without any availability loss.However, I would like to also be able to survive two AZ failure without any data loss (we would lose availability until one AZ comes back).
It is not quite clear to me how we can achieve this, but I think it would involve something like this:
X
seconds (X
would be a configuration parameter)