etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.48k stars 9.74k forks source link

Downgrade support from 3.5 to 3.4 #15878

Open logicalhan opened 1 year ago

logicalhan commented 1 year ago

What would you like to be added?

I would like to be able to safely downgrade from 3.5 to 3.4, and then safely reupgrade back to 3.5.

Why is this needed?

Given the vast number of data correctness issues we've unearthed in etcd 3.5 (many of them fixed by @ahrtr and @serathius), I have personal reservations about upgrading my k8s clusters to use 3.5. If there was a working rollback strategy (tested of course, as well), then I would be much more inclined to update my etcds to a more recent version.

serathius commented 1 year ago

I think it could be easily added to etcdutl migrate command allowing for safe offline downgrade and upgrade operations. Code: https://github.com/etcd-io/etcd/blob/main/etcdutl/etcdutl/migrate_command.go

This would also help with https://github.com/kubernetes/kubernetes/issues/117906 and cleanup of kubernetes migrate script for etcd.

lavacat commented 1 year ago

Please assign this to me, we already have a minimal internal patch to address this. In current form - it's a 3.4 patch that allows 3.4 to be deployed within 3.5 cluster to avoid downtime and perform a rolling downgrade. It's done by hacking version and removing confState and term keys. But it would be great to make it part of migrate and add more testing around it.

serathius commented 1 year ago

Just a note, we support for rolling update is out of scope for now. Let's start with the migrate script.

lavacat commented 1 year ago

Quick update - trying to get POC to work. The idea is to run etcdutl migrate --data-dir data-3.5 --target-version 3.4 and get a data dir that etcd 3.4 can be started with. My understanding is that currently migrate only updates MetaStorageVersionName key that was added since 3.6. But it won't update ClusterClusterVersionKeyName and version in v2store.

At the moment, running into

etcdserver/membership: cluster cannot be downgraded (current version: 3.4.26 is lower than determined cluster version: 3.5).

because of v2store version.

lavacat commented 1 year ago

For reference, tried running etcdctl downgrade from etcd 3.6 build targeting 3.5 cluster, but it didn't work.

Related design docs etcd Downgrades Design etcd storage versioning

$ ./bin/etcdctl downgrade validate 3.4
Downgrade validate success, cluster version 3.5.0

$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-05-30T01:12:41.770844-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded

etcd 3.5 one node cluster log

{"level":"info","ts":"2023-05-30T01:12:36.794018-0700","caller":"membership/cluster.go:890","msg":"The server is ready to downgrade","target-version":"3.4.0","server-version":"3.5.9"}
{"level":"warn","ts":"2023-05-30T01:12:36.88595-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}
{"level":"warn","ts":"2023-05-30T01:12:41.77082-0700","caller":"etcdserver/v3_server.go:1047","msg":"reject downgrade request","error":"etcdserver: request timed out"}
{"level":"warn","ts":"2023-05-30T01:12:41.770895-0700","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-05-30T01:12:36.773319-0700","time spent":"4.997554901s","remote":"127.0.0.1:62022","response type":"/etcdserverpb.Maintenance/Downgrade","request count":-1,"request size":-1,"response count":-1,"response size":-1,"request content":""}
{"level":"warn","ts":"2023-05-30T01:12:41.886266-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}

Going to debug this more.

jpbetz commented 1 year ago

Did we ever de-couple etcd version from data storage version? I vaguely recall multiple people pointing out that it is sort of silly that you can't automatically downgrade from 3.5 to 3.4 given that the file formats of the persisted data is identical, and that if we just gave data files a format version and only incremented it when we actually change how data is written to file that downgrade can be simpler.

lavacat commented 1 year ago

Version logic is a bit different between 3.4, 3.5 and 3.6. In 3.4 version is first decided in decideClusterVersion based on version.Version and then saved to v2store. In Recover we rely only on version recorded in v2store. See clusterVersionFromStore. Version is also saved to backend cluster/clusterVersion but it's never read.

3.5 added clusterVersionFromBackend, but I think v2store path is still used by default. Also 3.5 added downgradeInfoFromBackend. I don't fully understand downgrade, but I think workflow is described here

3.6 is using ClusterVersionFromBackend by default. It also added meta/storageVersion key that's used in migrate.

lavacat commented 1 year ago

~What do you think about adding a special flag to 3.4 to control version checks? See https://github.com/etcd-io/etcd/pull/15990 This will also allow rolling downgrade.~

Another option is to snapshot using etcdctl 3.5, then stop the cluster and restore using etcdctl 3.4. Here are steps I've used to test this: 3.5 cluster

bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

snapshot

./bin/etcdctl snapshot save snap-3.5

stop all nodes, remove infra dirs and restore:

./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra1 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra2 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra3 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'

then start cluster using 3.4 binary:

bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
lavacat commented 1 year ago

@serathius saw your comment on PR. Duplicating my question here. migrate will only help with removing confState and term, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using snapshot and it requires stopping all nodes.

I've also tried downgrade enable workflow, that still requires using snapshot but I was hoping there is no need to stop the cluster. It didn't work for me.

serathius commented 1 year ago

@serathius saw your comment on PR. Duplicating my https://github.com/etcd-io/etcd/pull/15990#issuecomment-1571629550 here. migrate will only help with removing confState and term, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using https://github.com/etcd-io/etcd/issues/15878#issuecomment-1571620392 and it requires stopping all nodes.

This is exactly what we need to support downgrades. Remove the confState and term fields. This is also exactly what downgrade enable does in v3.6, but it also coordinates the change between members in live cluster. We don't want to backport the coordination logic.

To make it clear, removing confState and term field is crucial for downgrades and etcd correctness. You are right that etcd v3.4 will just start from v3.5 data. However, have you thought about what will happen with confState and term fields? Etcd v3.4 is unaware of those fields so they will remain unchanged and ignored, and then you decide to upgrade back to v3.5 and it goes BOOOM. Etcd v3.5 starts, find those fields, assumes they come from previous v3.5 run and tries to use outdated confState and term. See https://github.com/etcd-io/etcd/issues/13514

One thing we can add in v3.4 is a safeguard for those fields. Have etcd v3.4.27 reject db file if it finds fields from v3.5. It should make it clear to user that just loading data from v3.5 in v3.4 is unsupported and will break their cluster, maybe not immediately, but later.

lavacat commented 1 year ago

You are right that etcd v3.4 will just start from v3.5 data.

That's actually was my main problem, without restoring from snapshot, v3.4 will fail to start if you just point to 3.5 data dir.

I've added fields to migrate in this PR https://github.com/etcd-io/etcd/pull/15994

lavacat commented 1 year ago

@serathius, update PR https://github.com/etcd-io/etcd/pull/15994, I think it's ready for review. But I'd like to clarify couple things.

1.

To make it clear, removing confState and term field is crucial for downgrades and etcd correctness.

v3.4 PR https://github.com/etcd-io/etcd/pull/15990 does this. See downgradeMetaBucket. Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE. The problem is that this PR adds "code smell".

  1. Assuming we are going with migrate, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using snapshot and I had to stop cluster. Am I missing something here? I can retest the procedure again.

serathius commented 1 year ago

cc @ahrtr @ptabor to get feedback about adding downgrade support.

serathius commented 1 year ago

Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE. The problem is that this PR adds "code smell".

Don't understand the statement. What is the code smell you see?

Assuming we are going with migrate, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using https://github.com/etcd-io/etcd/issues/15878#issuecomment-1571620392 and I had to stop cluster. Am I missing something here? I can retest the procedure again.

We should make it work though, can you provide logs so I can understand the problem you are facing?

ahrtr commented 1 year ago

I am not sure whether we should support downgrading 3.5 to 3.4.

Public Cloud

Private Cloud

Non-K8s use cases?

Any feedback please?

Online and offline migration

If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration. The offline approach is to backport & enhance the etcdutl migrate command to & in 3.5, as @serathius mentioned in https://github.com/etcd-io/etcd/issues/15878#issuecomment-1544227222. But it seems that the etcdutl migrate implementation in main branch doesn't update ClusterClusterVersionKeyName and ClusterDowngradeKeyName when migrating from 3.6 to 3.5?

The high level workflow of online downgrading is, downgrade_process

lavacat commented 1 year ago

@serathius

Don't understand the statement. What is the code smell you see

Adding 3.5.0 capability and downgradeMetaBucket in mvcc seem like a hack. But maybe just my personal perception :)

Here is an example of error when starting 3.4 with 3.5 data-dir

$ bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr

{"level":"fatal","ts":"2023-06-05T01:01:52.222568-0700","caller":"membership/cluster.go:795","msg":"invalid downgrade; server version is lower than determined cluster version","current-server-version":"3.4.26","determined-cluster-version":"3.5","stacktrace":"go.etcd.io/etcd/etcdserver/api/membership.mustDetectDowngrade\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:795\ngo.etcd.io/etcd/etcdserver/api/membership.(*RaftCluster).SetVersion\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:570\ngo.etcd.io/etcd/etcdserver.(*applierV2store).Put\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:97\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyV2Request\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:128\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntryNormal\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2237\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).apply\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2178\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntries\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1412\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyAll\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1136\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).run.func8\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1072\ngo.etcd.io/etcd/pkg/schedule.(*fifo).run\n\t/Users/bk/github/etcd-release-3-5/pkg/schedule/schedule.go:157"}

to remove this error, we need remove mustDetectDowngrade etcd v3.4 will start, but requests will fail with

$ ./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
{"level":"warn","ts":"2023-06-05T01:07:07.103655-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ca000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
Error: etcdserver: not capable

That's because we are missing 3.5.0 capability

lavacat commented 1 year ago

@ahrtr

I am not sure whether we should support downgrading 3.5 to 3.4.

We have 3.4 build with the patch https://github.com/etcd-io/etcd/pull/15990 in case there is a need to do rollback during incident, but we never had to do it. I think this is useful operationally and makes SREs happy, but if 3.4 is declared EOL, everyone will upgrade without the patch.

In terms of downgrade workflow I've tested using 3 node cluster and there are couple issues:

  1. First call downgrade enable fails, but downgrade job is actually started. I'm using etcdctl downgrade built from main.
    $ ./bin/etcdctl downgrade enable 3.4
    {"level":"warn","ts":"2023-06-05T01:20:28.807973-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000196780/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Error: context deadline exceeded
    $ ./bin/etcdctl downgrade enable 3.4
    {"level":"warn","ts":"2023-06-05T01:20:31.260858-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: cluster has a downgrade job in progress"}
    Error: etcdserver: cluster has a downgrade job in progress
  2. After replacing 1st member binary, 2 other members fail with
    {"level":"info","ts":"2023-06-05T01:21:14.489291-0700","caller":"membership/cluster.go:576","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"fd422379fda50e48","from":"3.5","to":"3.4"}
    {"level":"fatal","ts":"2023-06-05T01:21:14.489323-0700","caller":"membership/downgrade.go:59","msg":"invalid downgrade; server version is not allowed to join when downgrade is enabled","current-server-version":"3.5.9","target-cluster-version":"3.4.0","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/api/membership.mustDetectDowngrade\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/downgrade.go:59\ngo.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/cluster.go:593\ngo.etcd.io/etcd/server/v3/etcdserver.(*applierV2store).Put\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:101\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyV2Request\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:135\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2228\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2151\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1384\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1199\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1122\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.5.9/schedule/schedule.go:157"}
  3. After starting 2 failed members with 3.4 binary, I still get
    ./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
    {"level":"warn","ts":"2023-06-05T01:28:09.783384-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000240000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
    Error: etcdserver: not capable
serathius commented 1 year ago

Is GKE still using 3.4? It seemed yes a couple of months back. cc @serathius to double confirm.

Yes, GKE is on v3.4. That's why Han is asking for downgrade support so they can feel safe to upgrade.

If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration.

Don't agree. Online downgrade is totally broken in v3.4 and v3.5. The whole design was broken and fixing it would be to disrupt-full to backport. Making sure that downgrades v3.6 -> v3.5 works already will require a lot of qualification, we should not put more resources here.

What I'm proposing is just add support for offline so users avoid totally abandoning users and give them subpar, but working and tested path to rollback. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.

We don't need nothing more then for etcdutl migrate to officially support v3.4

serathius commented 1 year ago

@lavacat Please follow thread in https://github.com/etcd-io/etcd/issues/11716#issuecomment-858668690 on how broken the etcdctl downgrade enable is on v3.5.

lavacat commented 1 year ago

What I'm proposing is just add support for offline so users avoid totally abandoning users and give team subpar but working and tested path to downgrade. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.

I'm onboard with this. migrate with PR https://github.com/etcd-io/etcd/pull/15994 + using snapshot. No changes to 3.4.

@ahrtr ClusterClusterVersionKeyName in 3.4 is updated in SetVersion based on decided cluster version. see comment. During testing, after snapshot is restored, but before member starts

$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.5.0

after member starts

{"level":"info","ts":"2023-06-05T02:06:06.317983-0700","caller":"membership/cluster.go:547","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"91bc3c398fb3c146","from":"3.0","from":"3.4"}
{"level":"info","ts":"2023-06-05T02:06:06.318064-0700","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}

$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.4.0

ClusterDowngradeKeyName isn't present in 3.4. I can add it to migrate to be removed when 3.5->3.4.

tjungblu commented 1 year ago

Is OpenShift still using 3.4? cc @tjungblu to double confirm

Not with any currently supported version. Just to also give you some more data points here, to stay supported customers had to upgrade. So many thousand clusters successfully upgraded from 3.4 to 3.5 already, plus all our e2e test pipelines that were testing this for many ten-thousand runs previously.

I'm not aware of a single issue a customer had. The recommended downgrade procedure IIRC has been to restore the entire control plane with a snapshot from before the upgrade was kicked-off - but I don't think this was ever necessary.

chaochn47 commented 1 year ago

EKS seems have already upgraded to 3.5. cc @chaochn47 to double confirm

Yes. All the supported k8s version etcd clusters have upgraded to use 3.5.

From my understanding, to solve the upgrade failed triggers downgrade issue from k8s perspective.

  1. decouple etcd upgrade and k8s upgrade, so even if k8s upgrade fails, it won't trigger etcd to downgrade. etcd stays at 3.5.
  2. etcd supports downgrade from v3.5 to v3.4 with no downtime.
fuweid commented 1 year ago

Hi, @ahrtr. Sorry for late reply.

Is AKS still using 3.4?

Yes. And we are also using other versions depending on the cluster.

For this issue, it seems reasonable to me if we can have rollback solution with no downtime.

ahrtr commented 1 year ago

Thanks all for the feedback.

It seems that 3.4 is only used by minorities. A simple summary...

Also backporting online downgrading from 3.5 to 3.4 also require huge effort, it also might introduce additional risk of regression in 3.5. We should try to avoid adding any new feature to 3.5.

In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the etcdutl tool to support offline downgrade in case of disaster recovery.

logicalhan commented 1 year ago

In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the etcdutl tool to support offline downgrade in case of disaster recovery.

I disagree, GKE does not and has not used 3.5 and they are a major cloud provider. Google's position is that the number of regressions in 3.5 has made upgrade to 3.5 unviable without a safe downgrade path. Therefore, my position is that it should indeed be prioritized.

serathius commented 1 year ago

I'm on side that this is just too much work and too risky. See the amount of work, all the tasks listed in https://github.com/etcd-io/etcd/issues/13168. Online is just much more complicated then offline supports, as offline can be done by any external binary like etcdutl, but online needs to be built in into etcd binary.

Compare amount of work. For offline downgrading etcd from v3.5 to v3.4, you can just pick the etcdutl for v3.6 without a problem. It's just one PR https://github.com/etcd-io/etcd/pull/15994, still we are working on it for almost a month now. Compare it to online supports that requires backporting multiple months of work.

jmhbnz commented 1 year ago

My view is that thanks to the uptake of etcd 3.5.6+ in platforms like EKS, OCP and TKG and elsewhere we can draw some confidence from the hundreds of thousands of clusters that have been running successfully for long periods of time now with these versions without issues.

So my preference fwiw is to avoid any pathway involving extensive backports to 3.4 and focus on solid offline downgrade procedure.

serathius commented 1 year ago

Talked with @logicalhan, I understand his argument that offline downgrade is not viable on large fleet of etcds. It would be a disaster recover level. Fact is that downgrades where implemented broken in v3.5 and it took a big redesign to fix them for v3.6. This however means that we have left a broken API in v3.5. Online downgrades in v3.6 were implemented as bare bones feature, there are still a lot of places the downgrade mechanism needs to be plugged into. Having v3.5->v3.4 online downgrade could help us finish the work.

I would be supportive of fixing online v3.5 -> v3.4 downgrades as:

ahrtr commented 1 year ago

large fleet of etcds

I was thinking etcd 3.4 was only used by minority of K8s clusters for each cloud vendor, including private and public vendors, based on the feedback and my investigation. But it isn't the case for GKE based on the feedback from @logicalhan a couple of days back, the fact is ALL existing K8s versions in GKE are using etcd 3.4.x. I was shocked. It's already 2+ years since the release of 3.5.0, and also 1+ years since the community fixed all known data inconsistency issues.

it will be fully funded by @logicalhan.

I am curious how?

logicalhan commented 1 year ago

it will be fully funded by @logicalhan.

I am curious how?

We're hiring a person who will work on etcd (at least partially).

lavacat commented 1 year ago

Current version of PR works fine with the limitation that one has to use snapshot to downgrade or remove wal files. See https://github.com/etcd-io/etcd/pull/15994#discussion_r1270488475 This means that downgrade will require cluster downtime and potential data loss of entries in wal that aren't in snapshot yet.

The problem is that version is recorded in WAL and it has to be removed from WAL. We don't have mechanism to do that. Adding this mechanism is possible, but increases complexity of this change.

@serathius @ahrtr Do you both support adding wal manipulation as part of migrate command? Is the PR still relevant without online downgrade?

For GKE, @logicalhan @serathius I'm going to call out https://github.com/etcd-io/etcd/pull/15990/ again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain ClusterMemberAttrSet, DowngradeInfoSetRequest, AuthStatusRequest. I don't think this should be merged, but can be a tradeoff if you want to do your 3.4 -> 3.5 upgrade sooner.

ahrtr commented 1 year ago

For GKE, @logicalhan @serathius I'm going to call out #15990 again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain ClusterMemberAttrSet, DowngradeInfoSetRequest, AuthStatusRequest.

This seems to be the cheapest direction.

Downgrading 3.5 to 3.4 is a special case, we don't have to backport the complete downgrading feature to 3.5. It's risky to do that, and it will also complicate the 3.5 code base.

Proposed change for 3.4 (on top of @lavacat 's https://github.com/etcd-io/etcd/pull/15990)

EDIT: We don't need to worry about ClusterVersionSetRequest, ClusterMemberAttrSetRequest, and DowngradeInfoSetRequest at all.

So We only need to take care of AuthStatusRequest in 3.4.

More references:

Impact on users (e.g. GKE)

If they want to benefit from this solution. They can't upgrade from old 3.4 to 3.5 directly. Instead, they must upgrade their clusters to a new 3.4.X version (which includes the change proposed above) in the first step, then upgrade to 3.5.x in the second step.

Do we still need https://github.com/etcd-io/etcd/pull/15994?

No, as long as previously the clusters was on a 3.4.x version with the change proposed above.

lavacat commented 1 year ago

@ahrtr in principle I agree with your approach. Making changes to 3.4 to support online downgrade seems more practical.

I don't mind throwing away https://github.com/etcd-io/etcd/pull/15994, but it might be cleaner to perform backend migrate instead of dealing with term and confState in 3.4. This way we also use new migrate framework.

Then in 3.4 we can have a flag --experimental-downgrade-3-5 that allows 3.4 to start within 3.5 cluster:

Let's discuss during next community meeting, so everyone is in agreement on next steps. If there is more information/POC needed, let me know, I'll try to compose everything before the meeting.

ahrtr commented 1 year ago

As discussed in previous community meeting, the offline downgrade tool isn't the point. The point is [whether or not] or how to support online downgrade from 3.5 to 3.4.

Usually it's common to make new version (e.g. 3.6) to be backward compatible with old version (e.g. 3.5), and it's exactly the principle what the existing downgrade feature follows. For example, when downgrading from 3.6 to 3.5, the etcd 3.6 instance should migrate the data to be 3.5 compatible.

But the online downgrade is a big & complicated feature, it isn't feasible & safe to backport the complete feature from 3.6 to 3.5.

Instead, we can treat the online downgrade from 3.5 to 3.4 as a special case. I think we can just spend minor or moderate effort to make the old version (3.4) to be forward compatible with the version (3.5). Specifically, we just need to ensure the 3.4 binary can run on the data generated by 3.5 binary, roughly just as I mentioned above https://github.com/etcd-io/etcd/issues/15878#issuecomment-1653790303.

siyuanfoundation commented 8 months ago

I have written a design doc regarding the path forward. Please take a look and provide feedbacks, thanks!

cc @ahrtr @lavacat @serathius @logicalhan @fuweid

siyuanfoundation commented 8 months ago

Tracking work