apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.51k stars 1.31k forks source link

If all replicas are lost, a StorageServer could source a shard replication from a backup. #3699

Open satherton opened 4 years ago

satherton commented 4 years ago

If all replicas of a shard or set of shards are lost, it is actually possible, but slow, to restore them from an active backup.

Note that this plan assumes that although the shard is not readable it is still possible to commit blind writes to it. If we remove this requirement then the complexity is greatly reduced.

The sequence is roughly

  1. Use the backup metadata to find the Key-Value Range Snapshot files relevant to the target set of shards and load the relevant ranges from those files.
  2. Use the backup log stream to update each of the loaded ranges to a version which still exists in the FDB log system for the target shards.
  3. Switch to using the FDB log system as the source of mutations (1-2 minutes behind, see below)
  4. Keep applying until caught up.

There are of course a lot of details being glossed over here. Here are the ones I can think of:

If instead blind writes to the lost shards is not allowed, then there is no need to switch to the log system as a mutation source in the context of the restore process. Once the backup mutation log has been used to update the shards to a data version at or greater than the point where each shard, respectively, was lost, then the shard can be brought back online.

Also, without the writability requirement then it could be argued that a separate selective restore using the existing process is the route to take. That's up for debate, but I rather like the elegance that DataDistribution could start this process automatically after shards are missing for some time, using the active backup on the default tag, and then could cancel the process if any of the shard replicas come back online.

dongxinEric commented 4 years ago

This is sort of related to #1002 , basically this is bulk loading a shard into one/multiple storage server(s).

satherton commented 4 years ago

After thinking about this some more, the right process is probably just for DD to kick off a FastRestore of the lost shards into the cluster.

@dongxinEric Certainly related, though even without that improvement FastRestore (with some changes) could be used to restore missing shards in a live cluster, it would just be slower going through the log system.

There's still the complexity (or not) of supporting blind writes on the missing shards during the restore. If the shards will remain writable then FastRestore must continue pulling mutations from the backup until it catches up to the log system as described above.

xumengpanda commented 4 years ago

IMO, the key to restoring when several SSes are lost is to get the shards whose replicas are all on these lost SSes.

This requires backing up the shard-to-SS mapping in the normal backup process.

When multiple SSes are lost, the fast restore can first restore the shard-to-SS mapping metadata, figure out which shards to restore, and restore them to another cluster or to the original cluster as usual.