etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.53k stars 9.74k forks source link

Expected v2 deprecation behavior in HA clusters #17009

Open serathius opened 10 months ago

serathius commented 10 months ago

What would you like to be added?

I would like to discuss how etcd v3.6 should behave with regards to v2 store API.

Background

v2 API was deprecated in etcd v3.4, but could be still used as long as you provided --enable-v2 flag. Didn't change in v3.5, however for v3.6 we are planning for total removal. Expected behavior is that when upgrading to v3.6, etcd will panic if there is any v2 data still left. More in https://github.com/etcd-io/etcd/issues/12913

User can do two things:

Problem

What happens if in HA clusters during upgrade/downgrade (v3.6 supports downgrade to v3.5), if user forgets that etcd v3.5 member still uses --enable-v2 and introduces a v2 change to cluster. This is worrying as a single member could take down whole cluster. Fixing this would require to reconfigure whole cluster to run with --v2-deprecation=write-only-drop-data

Options:

Options rejected:

Why is this needed?

Want to make sure this is properly discussed, understood and documented.

serathius commented 10 months ago

cc @ahrtr @jmhbnz @wenjiaswe

wenjiaswe commented 10 months ago

cc @lavacat @chaochn47 @siyuanfoundation

chaochn47 commented 10 months ago

Have v3.6 members reject v2 proposals. Not sure this is possible, we as v3.5 can still become a leader. I would be careful about changing logic for leader eligibility.

Another option, proposals could be accepted but apply is rejected just like no space applier or corruption applier.

siyuanfoundation commented 10 months ago

Have v3.6 members reject v2 proposals. Not sure this is possible, we as v3.5 can still become a leader. I would be careful about changing logic for leader eligibility.

Another option, proposals could be accepted but apply is rejected just like no space applier or corruption applier.

rejecting apply would not stop the server ack commit index progress, this would give the client a false sense of HA if v3.5 is the leader.

serathius commented 10 months ago

Have v3.6 members reject v2 proposals. Not sure this is possible, we as v3.5 can still become a leader. I would be careful about changing logic for leader eligibility.

Another option, proposals could be accepted but apply is rejected just like no space applier or corruption applier.

This is the Check the snapshot and WAL for v2 data only on bootstrap, skip it later. It will lead to inconsistency on v2 state. case. I used word skip instead of reject but meant the same thing. They are just treated as no-op.

serathius commented 10 months ago

rejecting apply would not stop the server ack commit index progress, this would give the client a false sense of HA if v3.5 is the leader.

It's less about HA, more about inconsistency if user ever aborted upgrade. etcd v3.6 already doesn't expose v2 API so there is no HA for it. The inconsistency happens if user reverted the upgrade then the member that was temporarily v3.6 would have the same data to one that stayed v3.5 all the time.

cc @ahrtr

ahrtr commented 10 months ago
stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.