etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.89k stars 9.78k forks source link

[3.5] Backport fix tx buffer inconsistency if there are unordered key writes in one tx #18861

Open chaochn47 opened 1 week ago

chaochn47 commented 1 week ago

Fix tx buffer inconsistency if there are unordered key writes in one tx.

This bug fix seems like an important one we need to get into release branch in https://github.com/etcd-io/etcd/issues/18846.

WDTY? @ahrtr @serathius @fuweid @ivanvc

Notes:

  1. batch_tx_test.go is not incorporated into this backport because main and release-3.5 diverges a lot on buckets/schema/etc.. It is also not the core of the bug fix.
  2. verify in tx_buffer.go is removed as mainly due to the dependency is not there in relese-3.5.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

chaochn47 commented 1 week ago

/retest

chaochn47 commented 1 week ago

Hi @wenjiaswe, as the bot suggested, could you please take a look at the backport PR?

k8s-ci-robot commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, chaochn47

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/etcd-io/etcd/blob/release-3.5/OWNERS)~~ [ahrtr] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jmhbnz commented 1 week ago

/retitle [3.5] Backport fix tx buffer inconsistency if there are unordered key writes in one tx

chaochn47 commented 1 week ago

Testing just dedup function is not enough to ensure there is no regression with the backport. Code differences should make us even more careful and ensure the change is tested instead of removing tests if it's not convenient.

Thanks for catching this. Just realized replacing schema with buckets should be able to backport the test coverage of this change in batch_tx_test.go.

as the issue is internal and requires other changes that were done on main branch to be visible.

Hi @serathius could you please elaborate what the other change is in main branch that was disclosing this bug?

Is it because another NEW key value was written into the non-key buckets without proper lexicographical order?

This means we should not rush this backport.

Agreed. Safety of correctness is the key for release branches. Could you please suggest what additional safeguard / testing coverage we should have before committing this back port?

chaochn47 commented 1 week ago

/retest