Open satherton opened 4 years ago
This is sort of related to #1002 , basically this is bulk loading a shard into one/multiple storage server(s).
After thinking about this some more, the right process is probably just for DD to kick off a FastRestore of the lost shards into the cluster.
@dongxinEric Certainly related, though even without that improvement FastRestore (with some changes) could be used to restore missing shards in a live cluster, it would just be slower going through the log system.
There's still the complexity (or not) of supporting blind writes on the missing shards during the restore. If the shards will remain writable then FastRestore must continue pulling mutations from the backup until it catches up to the log system as described above.
IMO, the key to restoring when several SSes are lost is to get the shards whose replicas are all on these lost SSes.
This requires backing up the shard-to-SS mapping in the normal backup process.
When multiple SSes are lost, the fast restore can first restore the shard-to-SS mapping metadata, figure out which shards to restore, and restore them to another cluster or to the original cluster as usual.
If all replicas of a shard or set of shards are lost, it is actually possible, but slow, to restore them from an active backup.
Note that this plan assumes that although the shard is not readable it is still possible to commit blind writes to it. If we remove this requirement then the complexity is greatly reduced.
The sequence is roughly
There are of course a lot of details being glossed over here. Here are the ones I can think of:
If instead blind writes to the lost shards is not allowed, then there is no need to switch to the log system as a mutation source in the context of the restore process. Once the backup mutation log has been used to update the shards to a data version at or greater than the point where each shard, respectively, was lost, then the shard can be brought back online.
Also, without the writability requirement then it could be argued that a separate selective restore using the existing process is the route to take. That's up for debate, but I rather like the elegance that DataDistribution could start this process automatically after shards are missing for some time, using the active backup on the default tag, and then could cancel the process if any of the shard replicas come back online.