Closed serathius closed 3 days ago
@serathius: GitHub didn't allow me to request PR reviews from the following users: fykaa.
Note that only etcd-io members and repo collaborators can review this PR, and authors cannot review their own PRs.
I have idea for big rewrite of history patching, however it would require more tests. As this PR would also benefit from them I wait with it until we have tests.
I am confused by the PR. Can you give some context why the code before this change would result in duplicating states?
I am confused by the PR. Can you give some context why the code before this change would result in duplicating states?
This code is meant to remove the duplicated states caused by apply
function. When linearizing operation history, this is one of the functions used to decide, is this state transition possible (bool)? What is the new state(nonDeterministicState)? Based on the current state and the provided request it tries to answer if the provided response is possible. Depending on how much we know about the response we might be able to detect early that this linearization branch we are exploring is not possible, and this is what we want to be able to reject those branches as early as possible.
This is codes in the following function, the switch is from the most lenient to the most strict option of validating response:
func (states nonDeterministicState) apply(request EtcdRequest, response MaybeEtcdResponse) (bool, nonDeterministicState) {
var newStates nonDeterministicState
switch {
case response.Error != "":
newStates = states.applyFailedRequest(request)
case response.Persisted && response.PersistedRevision == 0:
newStates = states.applyPersistedRequest(request)
case response.Persisted && response.PersistedRevision != 0:
newStates = states.applyPersistedRequestWithRevision(request, response.PersistedRevision)
default:
newStates = states.applyRequestWithResponse(request, response.EtcdResponse)
}
return len(newStates) > 0, newStates
}
If you check the applyFailedRequest
code, it duplicates the states because it doesn't know if etcd handled the request or not:
// applyFailedRequest returns both the original states and states with applied request. It considers both cases, request was persisted and request was lost.
func (states nonDeterministicState) applyFailedRequest(request EtcdRequest) nonDeterministicState {
newStates := make(nonDeterministicState, 0, len(states)*2)
for _, s := range states {
newStates = append(newStates, s)
newState, _ := s.Step(request)
if !reflect.DeepEqual(newState, s) {
newStates = append(newStates, newState)
}
}
return newStates
}
If we know that request was persisted by etcd WAL, then we can just assume that it executed and use different apply code:
// applyPersistedRequest applies request to all possible states.
func (states nonDeterministicState) applyPersistedRequest(request EtcdRequest) nonDeterministicState {
newStates := make(nonDeterministicState, 0, len(states))
for _, s := range states {
newState, _ := s.Step(request)
newStates = append(newStates, newState)
}
return newStates
}
/retest
Any reviewers willing to review?
Just making sure we merge it soon as the linearization timeouts have become frequent enough that they happen on presubmit https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/18227/pull-etcd-robustness-amd64/1805320012408295424
Member
Thanks for explaining the details! LGTM
/cc @siyuanfoundation @MadhavJivrajani @fuweid @fykaa @henrybear327