Closed liquid-helium closed 3 years ago
Also DD will end up in crash-looping.
Well why DD would crash when a key range's data nuked due to losing all servers in a team? That sounds like a bug. It should scream about that, but not crash.
Also DD will end up in crash-looping.
Well why DD would crash when a key range's data nuked due to losing all servers in a team? That sounds like a bug.
When the initial team is loaded, empty source servers
are not checked, so there will be team
without any source servers. Then the TeamTracker thinks that team is unhealthy, and tries to get the SS's info, that when it crashes.
It should scream about that, but not crash.
That's a good point. Let me create another bug to fix that.
One follow-up question is do we expect it is an invariant that teams should not be empty
. If we allow the existence of empty teams , then DD can handle this situation gracefully, i.e., by moving the empty range to another team. However, it is not easy to tell if the team is empty due to bug or human operations. Hence, it might make sense to treat empty team
as an error, what do you think?
Here is the new issue: https://github.com/apple/foundationdb/issues/5617
When a team is lost, e.g., the machines/disks are gone, we want to bring the cluster into a consistent state.
The failed servers need to be removed, which can be done with
fdbcli exclude failed
command.In addition, the keyrange should be emptied, and be assigned to a new team.
After the above 2 operations, restore can be performed.
Currently, 1 is in place, however, after excluding the all servers of a team, the keyrange in that team becomes unavailable, and cannot be moved since all source servers are gone. Also DD will end up in crash-looping.