apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.6k stars 1.32k forks source link

Don't lose data if two availability zones go down #3638

Open sfc-gh-mpilman opened 4 years ago

sfc-gh-mpilman commented 4 years ago

If three_data_hall is used across three availability zones, we currently guarantee that FDB can survive the failure of one AZ and one machine without any availability loss.

However, I would like to also be able to survive two AZ failure without any data loss (we would lose availability until one AZ comes back).

It is not quite clear to me how we can achieve this, but I think it would involve something like this:

  1. During recruitment try to recruit tlogs in all availability zones for X seconds (X would be a configuration parameter)
  2. If recruiting in three AZs is impossible, recruit only in two, but set failure tolerance to a different value.
  3. Don't recover at all if only one AZ is available.
ajbeamon commented 4 years ago

Are you saying that we should change three_data_hall to have this property, or that we should have a different configuration that does this?

sfc-gh-mpilman commented 4 years ago

I think it would make sense to introduce a three_availability_zone mode as this will make it clear that this is what people probably want to run if they run in the cloud. Whether this is just a rename of three_data_hall plus the described feature or just a new policy - I don't know. I would assume data hall failures are more common than AZ failures - so it probably would make sense to support both behaviors (though I am not sure - I also don't know whether anyone uses three_data_hall in production in actual data halls...)

xumengpanda commented 4 years ago

It is not quite clear to me how we can achieve this, but I think it would involve something like this:

  1. During recruitment try to recruit tlogs in all availability zones for X seconds (X would be a configuration parameter)
  2. If recruiting in three AZs is impossible, recruit only in two, but set failure tolerance to a different value.
  3. Don't recover at all if only one AZ is available.

This will also affect how SSs are grouped to teams (replicas of the same data).

If we ask database operator to run FDB on more than three, instead of exactly three, availability zones for three_datahall/availiability mode, we can allow recovery and FDB will run in a more stable configuration/state. For a large deployment that have many FDB clusters, what is the extra cost of having more availability zones for the same size cluster, comparing to three AZs?

sfc-gh-mpilman commented 4 years ago

If we ask database operator to run FDB on more than three, instead of exactly three, availability zones for three_datahall/availiability mode, we can allow recovery and FDB will run in a more stable configuration/state.

Sadly this is not always possible. Only some regions have more than three availability zones - but if you want to run in eu-west-1 you don't have a choice (and having a replica in another region will come with serious performance implications).