ahrtr / etcd-defrag

An easier to use and smarter etcd defragmentation tool
MIT License
85 stars 9 forks source link

etcd cluster with a learner is not supported #26

Closed git-yww closed 1 year ago

git-yww commented 1 year ago

Currently, we tried to use etcd-defrag to implement defragmentations on our etcd clusters, and we found it failed quickly due to that the learner node in cluster did not support health check.

Here is the execution log:

Validating configuration.Validating the defragmentation rule: dbQuotaUsage > 0.8 || dbSizeFree/dbQuotaUsage > 0.5 ... validPerforming health check.{"level":"warn","ts":"2023-10-12T17:51:18.358902+0800","logger":"client","caller":"v3@v3.6.0-alpha.0.0.20230803155134-cca200345ab2/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00030a000/11.11.11.11:2379","method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: rpc not supported for learner"}endpoint: https://11.11.11.11:2379/, health: false, took: 7.499546ms, error: etcdserver: rpc not supported for learnerendpoint: https://33.33.33.33:2379/, health: true, took: 7.733879ms, error:endpoint: https://44.44.44.44:2379/, health: true, took: 9.555876ms, error:endpoint: https://55.55.55.55:2379/, health: true, took: 10.164246ms, error:endpoint: https://66.66.66.66:2379/, health: true, took: 9.741549ms, error:endpoint: https://22.22.22.22:2379/, health: true, took: 43.014812ms, error:

So is this an ongoing issue?

ahrtr commented 1 year ago

The workaround for now is to remove the learner from the --endpoints. Eventually the learner will be promoted to a voting member, right?

git-yww commented 1 year ago

The workaround for now is to remove the learner from the --endpoints. Eventually the learner will be promoted to a voting member, right?

This did not work during our tests. Health check will still request all members in cluster regardless of what --endpoints specifies.

ahrtr commented 1 year ago

Thanks for raising this issue. Learner members can only serve statusRequest and serializable read requests. Refer to util.go#L141-L150

So the solution is to programmatically remove learner members from the endpoint list. Would you be interested in delivering a PR?

git-yww commented 1 year ago

Sure, i’ll take care of it.

ahrtr commented 1 year ago

Sure, i’ll take care of it.

Thanks. Assigned to you.

ahrtr commented 1 year ago

Will release a new version today.

ahrtr commented 1 year ago

FYI. https://github.com/ahrtr/etcd-defrag/releases/tag/v0.7.0

docker pull ghcr.io/ahrtr/etcd-defrag:v0.7.0