Open logicalhan opened 1 year ago
I think it could be easily added to etcdutl migrate
command allowing for safe offline downgrade and upgrade operations.
Code: https://github.com/etcd-io/etcd/blob/main/etcdutl/etcdutl/migrate_command.go
This would also help with https://github.com/kubernetes/kubernetes/issues/117906 and cleanup of kubernetes migrate script for etcd.
Please assign this to me, we already have a minimal internal patch to address this. In current form - it's a 3.4 patch that allows 3.4 to be deployed within 3.5 cluster to avoid downtime and perform a rolling downgrade.
It's done by hacking version and removing confState
and term
keys.
But it would be great to make it part of migrate
and add more testing around it.
Just a note, we support for rolling update is out of scope for now. Let's start with the migrate script.
Quick update - trying to get POC to work. The idea is to run etcdutl migrate --data-dir data-3.5 --target-version 3.4
and get a data dir that etcd 3.4 can be started with.
My understanding is that currently migrate
only updates MetaStorageVersionName
key that was added since 3.6. But it won't update ClusterClusterVersionKeyName
and version
in v2store.
At the moment, running into
etcdserver/membership: cluster cannot be downgraded (current version: 3.4.26 is lower than determined cluster version: 3.5).
because of v2store version
.
For reference, tried running etcdctl downgrade
from etcd 3.6 build targeting 3.5 cluster, but it didn't work.
Related design docs etcd Downgrades Design etcd storage versioning
$ ./bin/etcdctl downgrade validate 3.4
Downgrade validate success, cluster version 3.5.0
$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-05-30T01:12:41.770844-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
etcd 3.5 one node cluster log
{"level":"info","ts":"2023-05-30T01:12:36.794018-0700","caller":"membership/cluster.go:890","msg":"The server is ready to downgrade","target-version":"3.4.0","server-version":"3.5.9"}
{"level":"warn","ts":"2023-05-30T01:12:36.88595-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}
{"level":"warn","ts":"2023-05-30T01:12:41.77082-0700","caller":"etcdserver/v3_server.go:1047","msg":"reject downgrade request","error":"etcdserver: request timed out"}
{"level":"warn","ts":"2023-05-30T01:12:41.770895-0700","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2023-05-30T01:12:36.773319-0700","time spent":"4.997554901s","remote":"127.0.0.1:62022","response type":"/etcdserverpb.Maintenance/Downgrade","request count":-1,"request size":-1,"response count":-1,"response size":-1,"request content":""}
{"level":"warn","ts":"2023-05-30T01:12:41.886266-0700","caller":"etcdserver/cluster_util.go:459","msg":"remotes server has mismatching etcd version","remote-member-id":"8e9e05c52164694d","current-server-version":"3.5.0","target-version":"3.4.0"}
Going to debug this more.
Did we ever de-couple etcd version from data storage version? I vaguely recall multiple people pointing out that it is sort of silly that you can't automatically downgrade from 3.5 to 3.4 given that the file formats of the persisted data is identical, and that if we just gave data files a format version and only incremented it when we actually change how data is written to file that downgrade can be simpler.
Version logic is a bit different between 3.4, 3.5 and 3.6.
In 3.4 version is first decided in decideClusterVersion based on version.Version and then saved to v2store. In Recover
we rely only on version recorded in v2store. See clusterVersionFromStore. Version is also saved to backend cluster/clusterVersion
but it's never read.
3.5 added clusterVersionFromBackend, but I think v2store path is still used by default. Also 3.5 added downgradeInfoFromBackend
. I don't fully understand downgrade
, but I think workflow is described here
3.6 is using ClusterVersionFromBackend
by default. It also added meta/storageVersion
key that's used in migrate
.
~What do you think about adding a special flag to 3.4 to control version checks? See https://github.com/etcd-io/etcd/pull/15990 This will also allow rolling downgrade.~
Another option is to snapshot using etcdctl 3.5, then stop the cluster and restore using etcdctl 3.4. Here are steps I've used to test this: 3.5 cluster
bin/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
snapshot
./bin/etcdctl snapshot save snap-3.5
stop all nodes, remove infra dirs and restore:
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra1 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra2 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
./bin-3.4/etcdctl snapshot restore snap-3.5 --name infra3 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380'
then start cluster using 3.4 binary:
bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra2 --listen-client-urls http://127.0.0.1:22379 --advertise-client-urls http://127.0.0.1:22379 --listen-peer-urls http://127.0.0.1:22380 --initial-advertise-peer-urls http://127.0.0.1:22380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
bin-3.4/etcd --name infra3 --listen-client-urls http://127.0.0.1:32379 --advertise-client-urls http://127.0.0.1:32379 --listen-peer-urls http://127.0.0.1:32380 --initial-advertise-peer-urls http://127.0.0.1:32380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
@serathius saw your comment on PR. Duplicating my question here. migrate
will only help with removing confState
and term
, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using snapshot and it requires stopping all nodes.
I've also tried downgrade enable
workflow, that still requires using snapshot but I was hoping there is no need to stop the cluster. It didn't work for me.
@serathius saw your comment on PR. Duplicating my https://github.com/etcd-io/etcd/pull/15990#issuecomment-1571629550 here. migrate will only help with removing confState and term, correct? v2store will still have 3.5 version. What is the process to complete the downgrade? The only way I've found was using https://github.com/etcd-io/etcd/issues/15878#issuecomment-1571620392 and it requires stopping all nodes.
This is exactly what we need to support downgrades. Remove the confState and term fields. This is also exactly what downgrade enable
does in v3.6, but it also coordinates the change between members in live cluster. We don't want to backport the coordination logic.
To make it clear, removing confState and term field is crucial for downgrades and etcd correctness. You are right that etcd v3.4 will just start from v3.5 data. However, have you thought about what will happen with confState
and term
fields? Etcd v3.4 is unaware of those fields so they will remain unchanged and ignored, and then you decide to upgrade back to v3.5 and it goes BOOOM. Etcd v3.5 starts, find those fields, assumes they come from previous v3.5 run and tries to use outdated confState and term. See https://github.com/etcd-io/etcd/issues/13514
One thing we can add in v3.4 is a safeguard for those fields. Have etcd v3.4.27 reject db file if it finds fields from v3.5. It should make it clear to user that just loading data from v3.5 in v3.4 is unsupported and will break their cluster, maybe not immediately, but later.
You are right that etcd v3.4 will just start from v3.5 data.
That's actually was my main problem, without restoring from snapshot, v3.4 will fail to start if you just point to 3.5 data dir.
I've added fields to migrate
in this PR https://github.com/etcd-io/etcd/pull/15994
@serathius, update PR https://github.com/etcd-io/etcd/pull/15994, I think it's ready for review. But I'd like to clarify couple things.
1.
To make it clear, removing confState and term field is crucial for downgrades and etcd correctness.
v3.4 PR https://github.com/etcd-io/etcd/pull/15990 does this. See downgradeMetaBucket
.
Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE.
The problem is that this PR adds "code smell".
Assuming we are going with migrate
, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using snapshot and I had to stop cluster. Am I missing something here? I can retest the procedure again.
cc @ahrtr @ptabor to get feedback about adding downgrade support.
Maybe I'm overthinking this but operationally having a 3.4 version that SRE team can downgrade to without any other manipulations will be most desirable for SRE. The problem is that this PR adds "code smell".
Don't understand the statement. What is the code smell you see?
Assuming we are going with migrate, I'd like to document steps for downgrade. Just pointing 3.4 to 3.5 data-dir didn't work. I was able to perform downgrade using https://github.com/etcd-io/etcd/issues/15878#issuecomment-1571620392 and I had to stop cluster. Am I missing something here? I can retest the procedure again.
We should make it work though, can you provide logs so I can understand the problem you are facing?
I am not sure whether we should support downgrading 3.5 to 3.4.
Any feedback please?
If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration. The offline approach is to backport & enhance the etcdutl migrate
command to & in 3.5, as @serathius mentioned in https://github.com/etcd-io/etcd/issues/15878#issuecomment-1544227222. But it seems that the etcdutl migrate
implementation in main branch doesn't update ClusterClusterVersionKeyName
and ClusterDowngradeKeyName
when migrating from 3.6 to 3.5?
The high level workflow of online downgrading is,
@serathius
Don't understand the statement. What is the code smell you see
Adding 3.5.0
capability and downgradeMetaBucket in mvcc seem like a hack. But maybe just my personal perception :)
Here is an example of error when starting 3.4 with 3.5 data-dir
$ bin-3.4/etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:12380 --initial-advertise-peer-urls http://127.0.0.1:12380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380' --initial-cluster-state new --enable-pprof --logger=zap --log-outputs=stderr
{"level":"fatal","ts":"2023-06-05T01:01:52.222568-0700","caller":"membership/cluster.go:795","msg":"invalid downgrade; server version is lower than determined cluster version","current-server-version":"3.4.26","determined-cluster-version":"3.5","stacktrace":"go.etcd.io/etcd/etcdserver/api/membership.mustDetectDowngrade\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:795\ngo.etcd.io/etcd/etcdserver/api/membership.(*RaftCluster).SetVersion\n\t/Users/bk/github/etcd-release-3-5/etcdserver/api/membership/cluster.go:570\ngo.etcd.io/etcd/etcdserver.(*applierV2store).Put\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:97\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyV2Request\n\t/Users/bk/github/etcd-release-3-5/etcdserver/apply_v2.go:128\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntryNormal\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2237\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).apply\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:2178\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntries\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1412\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).applyAll\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1136\ngo.etcd.io/etcd/etcdserver.(*EtcdServer).run.func8\n\t/Users/bk/github/etcd-release-3-5/etcdserver/server.go:1072\ngo.etcd.io/etcd/pkg/schedule.(*fifo).run\n\t/Users/bk/github/etcd-release-3-5/pkg/schedule/schedule.go:157"}
to remove this error, we need remove mustDetectDowngrade etcd v3.4 will start, but requests will fail with
$ ./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
{"level":"warn","ts":"2023-06-05T01:07:07.103655-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001ca000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
Error: etcdserver: not capable
That's because we are missing 3.5.0
capability
@ahrtr
I am not sure whether we should support downgrading 3.5 to 3.4.
We have 3.4 build with the patch https://github.com/etcd-io/etcd/pull/15990 in case there is a need to do rollback during incident, but we never had to do it. I think this is useful operationally and makes SREs happy, but if 3.4 is declared EOL, everyone will upgrade without the patch.
In terms of downgrade workflow I've tested using 3 node cluster and there are couple issues:
downgrade enable
fails, but downgrade
job is actually started. I'm using etcdctl downgrade
built from main
.
$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-06-05T01:20:28.807973-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000196780/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded
$ ./bin/etcdctl downgrade enable 3.4
{"level":"warn","ts":"2023-06-05T01:20:31.260858-0700","logger":"etcd-client","caller":"v3@v3.6.0-alpha.0/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001fc000/127.0.0.1:2379","method":"/etcdserverpb.Maintenance/Downgrade","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: cluster has a downgrade job in progress"}
Error: etcdserver: cluster has a downgrade job in progress
{"level":"info","ts":"2023-06-05T01:21:14.489291-0700","caller":"membership/cluster.go:576","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"fd422379fda50e48","from":"3.5","to":"3.4"}
{"level":"fatal","ts":"2023-06-05T01:21:14.489323-0700","caller":"membership/downgrade.go:59","msg":"invalid downgrade; server version is not allowed to join when downgrade is enabled","current-server-version":"3.5.9","target-cluster-version":"3.4.0","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver/api/membership.mustDetectDowngrade\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/downgrade.go:59\ngo.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetVersion\n\tgo.etcd.io/etcd/server/v3/etcdserver/api/membership/cluster.go:593\ngo.etcd.io/etcd/server/v3/etcdserver.(*applierV2store).Put\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:101\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyV2Request\n\tgo.etcd.io/etcd/server/v3/etcdserver/apply_v2.go:135\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2228\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:2151\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1384\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1199\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:1122\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\tgo.etcd.io/etcd/pkg/v3@v3.5.9/schedule/schedule.go:157"}
./bin/etcdctl put foo bar --endpoints=http://127.0.0.1:2379
{"level":"warn","ts":"2023-06-05T01:28:09.783384-0700","logger":"etcd-client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000240000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: not capable"}
Error: etcdserver: not capable
Is GKE still using 3.4? It seemed yes a couple of months back. cc @serathius to double confirm.
Yes, GKE is on v3.4. That's why Han is asking for downgrade support so they can feel safe to upgrade.
If we really need to support downgrading 3.5 to 3.4, then we need to support both online and offline migration.
Don't agree. Online downgrade is totally broken in v3.4 and v3.5. The whole design was broken and fixing it would be to disrupt-full to backport. Making sure that downgrades v3.6 -> v3.5 works already will require a lot of qualification, we should not put more resources here.
What I'm proposing is just add support for offline so users avoid totally abandoning users and give them subpar, but working and tested path to rollback. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.
We don't need nothing more then for etcdutl migrate
to officially support v3.4
@lavacat Please follow thread in https://github.com/etcd-io/etcd/issues/11716#issuecomment-858668690 on how broken the etcdctl downgrade enable
is on v3.5.
What I'm proposing is just add support for offline so users avoid totally abandoning users and give team subpar but working and tested path to downgrade. We don't need the experience to be great. It just needs to work in case of disaster recovery to ensure the most reluctant users of v3.4 feel safe to upgrade to v3.5.
I'm onboard with this. migrate
with PR https://github.com/etcd-io/etcd/pull/15994 + using snapshot. No changes to 3.4.
@ahrtr ClusterClusterVersionKeyName
in 3.4 is updated in SetVersion based on decided cluster version. see comment.
During testing, after snapshot is restored, but before member starts
$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.5.0
after member starts
{"level":"info","ts":"2023-06-05T02:06:06.317983-0700","caller":"membership/cluster.go:547","msg":"updated cluster version","cluster-id":"ef37ad9dc622a7c4","local-member-id":"91bc3c398fb3c146","from":"3.0","from":"3.4"}
{"level":"info","ts":"2023-06-05T02:06:06.318064-0700","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}
$ bbolt get infra1.etcd/member/snap/db cluster clusterVersion
3.4.0
ClusterDowngradeKeyName
isn't present in 3.4. I can add it to migrate
to be removed when 3.5->3.4.
Is OpenShift still using 3.4? cc @tjungblu to double confirm
Not with any currently supported version. Just to also give you some more data points here, to stay supported customers had to upgrade. So many thousand clusters successfully upgraded from 3.4 to 3.5 already, plus all our e2e test pipelines that were testing this for many ten-thousand runs previously.
I'm not aware of a single issue a customer had. The recommended downgrade procedure IIRC has been to restore the entire control plane with a snapshot from before the upgrade was kicked-off - but I don't think this was ever necessary.
EKS seems have already upgraded to 3.5. cc @chaochn47 to double confirm
Yes. All the supported k8s version etcd clusters have upgraded to use 3.5.
From my understanding, to solve the upgrade failed triggers downgrade issue from k8s perspective.
Hi, @ahrtr. Sorry for late reply.
Is AKS still using 3.4?
Yes. And we are also using other versions depending on the cluster.
For this issue, it seems reasonable to me if we can have rollback solution with no downtime.
Thanks all for the feedback.
It seems that 3.4 is only used by minorities. A simple summary...
Also backporting online downgrading from 3.5 to 3.4 also require huge effort, it also might introduce additional risk of regression in 3.5. We should try to avoid adding any new feature to 3.5.
In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the etcdutl
tool to support offline downgrade in case of disaster recovery.
In short, I don't think we should spend too much effort on supporting online downgrade from 3.5 to 3.4. But at the minimum, it's accepted to enhance the
etcdutl
tool to support offline downgrade in case of disaster recovery.
I disagree, GKE does not and has not used 3.5 and they are a major cloud provider. Google's position is that the number of regressions in 3.5 has made upgrade to 3.5 unviable without a safe downgrade path. Therefore, my position is that it should indeed be prioritized.
I'm on side that this is just too much work and too risky. See the amount of work, all the tasks listed in https://github.com/etcd-io/etcd/issues/13168. Online is just much more complicated then offline supports, as offline can be done by any external binary like etcdutl
, but online needs to be built in into etcd
binary.
Compare amount of work. For offline downgrading etcd from v3.5 to v3.4, you can just pick the etcdutl
for v3.6 without a problem. It's just one PR https://github.com/etcd-io/etcd/pull/15994, still we are working on it for almost a month now. Compare it to online supports that requires backporting multiple months of work.
My view is that thanks to the uptake of etcd 3.5.6+ in platforms like EKS, OCP and TKG and elsewhere we can draw some confidence from the hundreds of thousands of clusters that have been running successfully for long periods of time now with these versions without issues.
So my preference fwiw is to avoid any pathway involving extensive backports to 3.4 and focus on solid offline downgrade procedure.
Talked with @logicalhan, I understand his argument that offline downgrade is not viable on large fleet of etcds. It would be a disaster recover level. Fact is that downgrades where implemented broken in v3.5 and it took a big redesign to fix them for v3.6. This however means that we have left a broken API in v3.5. Online downgrades in v3.6 were implemented as bare bones feature, there are still a lot of places the downgrade mechanism needs to be plugged into. Having v3.5->v3.4 online downgrade could help us finish the work.
I would be supportive of fixing online v3.5 -> v3.4 downgrades as:
large fleet of etcds
I was thinking etcd 3.4 was only used by minority of K8s clusters for each cloud vendor, including private and public vendors, based on the feedback and my investigation. But it isn't the case for GKE based on the feedback from @logicalhan a couple of days back, the fact is ALL existing K8s versions in GKE are using etcd 3.4.x
. I was shocked. It's already 2+ years since the release of 3.5.0, and also 1+ years since the community fixed all known data inconsistency issues.
it will be fully funded by @logicalhan.
I am curious how?
it will be fully funded by @logicalhan.
I am curious how?
We're hiring a person who will work on etcd (at least partially).
Current version of PR works fine with the limitation that one has to use snapshot to downgrade or remove wal files. See https://github.com/etcd-io/etcd/pull/15994#discussion_r1270488475 This means that downgrade will require cluster downtime and potential data loss of entries in wal that aren't in snapshot yet.
The problem is that version is recorded in WAL and it has to be removed from WAL. We don't have mechanism to do that. Adding this mechanism is possible, but increases complexity of this change.
@serathius @ahrtr Do you both support adding wal manipulation as part of migrate command? Is the PR still relevant without online downgrade?
For GKE, @logicalhan @serathius I'm going to call out https://github.com/etcd-io/etcd/pull/15990/ again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain ClusterMemberAttrSet
, DowngradeInfoSetRequest
, AuthStatusRequest
. I don't think this should be merged, but can be a tradeoff if you want to do your 3.4 -> 3.5 upgrade sooner.
For GKE, @logicalhan @serathius I'm going to call out #15990 again. You can have 3.4 internal build that you can rollback to as long as wal doesn't contain
ClusterMemberAttrSet
,DowngradeInfoSetRequest
,AuthStatusRequest
.
This seems to be the cheapest direction.
Downgrading 3.5 to 3.4 is a special case, we don't have to backport the complete downgrading feature to 3.5. It's risky to do that, and it will also complicate the 3.5 code base.
clusterVersion
) on startup and on snapshot recovery, just as https://github.com/etcd-io/etcd/pull/15990 does. ClusterVersionSetRequest
, ClusterMemberAttrSetRequest
, DowngradeInfoSetRequest
, AuthStatusRequest
). Recognise them but ignore them. Note: NO ANY CHANGE/manipulation ON THE WAL FILES.EDIT: We don't need to worry about ClusterVersionSetRequest
, ClusterMemberAttrSetRequest
, and DowngradeInfoSetRequest
at all.
ClusterVersionSetRequest
is only used by updateClusterVersionV3
(in 3.5), which isn't called at all in 3.5.ClusterMemberAttrSetRequest
is only used by publishV3
, which again isn't called at all in 3.5.DowngradeInfoSetRequest
is supported by etcd 3.5, but there is no client side command. Downgrade isn't a completed feature in 3.5. So we don't need to worry about it for 3.5.So We only need to take care of AuthStatusRequest
in 3.4.
More references:
If they want to benefit from this solution. They can't upgrade from old 3.4 to 3.5 directly. Instead, they must upgrade their clusters to a new 3.4.X version (which includes the change proposed above) in the first step, then upgrade to 3.5.x in the second step.
No, as long as previously the clusters was on a 3.4.x version with the change proposed above.
@ahrtr in principle I agree with your approach. Making changes to 3.4 to support online downgrade seems more practical.
I don't mind throwing away https://github.com/etcd-io/etcd/pull/15994, but it might be cleaner to perform backend migrate
instead of dealing with term
and confState
in 3.4. This way we also use new migrate
framework.
Then in 3.4 we can have a flag --experimental-downgrade-3-5
that allows 3.4 to start within 3.5 cluster:
capability
https://github.com/etcd-io/etcd/pull/15990/files#diff-8c373ed6659c31f66e9815a4a17b60d32705f1d4f3f0d075b6c6f057df093390mustDetectDowngrade
AuthStatusRequest
Let's discuss during next community meeting, so everyone is in agreement on next steps. If there is more information/POC needed, let me know, I'll try to compose everything before the meeting.
As discussed in previous community meeting, the offline downgrade tool isn't the point. The point is [whether or not] or how to support online downgrade from 3.5 to 3.4.
Usually it's common to make new version (e.g. 3.6) to be backward compatible with old version (e.g. 3.5), and it's exactly the principle what the existing downgrade feature follows. For example, when downgrading from 3.6 to 3.5, the etcd 3.6 instance should migrate the data to be 3.5 compatible.
But the online downgrade is a big & complicated feature, it isn't feasible & safe to backport the complete feature from 3.6 to 3.5.
Instead, we can treat the online downgrade from 3.5 to 3.4 as a special case. I think we can just spend minor or moderate effort to make the old version (3.4) to be forward compatible with the version (3.5). Specifically, we just need to ensure the 3.4 binary can run on the data generated by 3.5 binary, roughly just as I mentioned above https://github.com/etcd-io/etcd/issues/15878#issuecomment-1653790303.
I have written a design doc regarding the path forward. Please take a look and provide feedbacks, thanks!
cc @ahrtr @lavacat @serathius @logicalhan @fuweid
Tracking work
etcdutil migrate
command to clean up data fields in 3.5 for 3.4: https://github.com/etcd-io/etcd/pull/15994AuthStatusRequest
in 3.4: https://github.com/etcd-io/etcd/pull/17330migrate
tool to reflect 3.5 -> 3.4 downgrade process.
What would you like to be added?
I would like to be able to safely downgrade from 3.5 to 3.4, and then safely reupgrade back to 3.5.
Why is this needed?
Given the vast number of data correctness issues we've unearthed in etcd 3.5 (many of them fixed by @ahrtr and @serathius), I have personal reservations about upgrading my k8s clusters to use 3.5. If there was a working rollback strategy (tested of course, as well), then I would be much more inclined to update my etcds to a more recent version.