[X] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
I deployed a single-node etcd in a single-node k3s, and during normal operation, I used reboot to restart the host machine. After booting up, the etcd startup reported the following error:
{"level":"warn","ts":"2024-09-03T02:25:25.911Z","caller":"etcdserver/server.go:2075","msg":"failed to publish local member to cluster through raft","local-member-id":"7e5052d402a551ba","local-member-attributes":"{Name:apisix-etcd-0 ClientURLs:[http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379 http://apisix-etcd.pangu.svc.cluster.local:2379]}","request-path":"/0/members/7e5052d402a551ba/attributes","publish-timeout":"7s","error":"etcdserver: request timed out"}
The subsequent logs repeatedly print "failed to publish local member to cluster through raft". I have confirmed that the k3s DNS is functioning correctly, and the flannel network is also normal. I cleared the data in /bitnami/etcd/data within etcd, and after restarting etcd, it began to operate normally. The logs are as follows:
{"level":"info","ts":"2024-09-03T03:11:19.475Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["./etcd"]}
{"level":"warn","ts":"2024-09-03T03:11:19.477Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"member_id","data-dir":"/bitnami/etcd/data"}
{"level":"info","ts":"2024-09-03T03:11:19.477Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/bitnami/etcd/data","dir-type":"member"}
{"level":"info","ts":"2024-09-03T03:11:19.478Z","caller":"embed/etcd.go:124","msg":"configuring peer listeners","listen-peer-urls":["http://0.0.0.0:2380"]}
{"level":"info","ts":"2024-09-03T03:11:19.479Z","caller":"embed/etcd.go:132","msg":"configuring client listeners","listen-client-urls":["http://0.0.0.0:2379"]}
{"level":"info","ts":"2024-09-03T03:11:19.479Z","caller":"embed/etcd.go:306","msg":"starting an etcd server","etcd-version":"3.5.6","git-sha":"cecbe35ce","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"apisix-etcd-0","data-dir":"/bitnami/etcd/data","wal-dir":"","wal-dir-dedicated":"","member-dir":"/bitnami/etcd/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379","http://apisix-etcd.pangu.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2024-09-03T03:11:19.481Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/bitnami/etcd/data/member/snap/db","took":"816.749µs"}
{"level":"info","ts":"2024-09-03T03:11:19.511Z","caller":"etcdserver/server.go:530","msg":"No snapshot found. Recovering WAL from scratch!"}
{"level":"info","ts":"2024-09-03T03:11:19.576Z","caller":"etcdserver/raft.go:529","msg":"restarting local member","cluster-id":"22a17510595a1472","local-member-id":"7e5052d402a551ba","commit-index":23200}
{"level":"info","ts":"2024-09-03T03:11:19.584Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba switched to configuration voters=()"}
{"level":"info","ts":"2024-09-03T03:11:19.584Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became follower at term 5"}
{"level":"info","ts":"2024-09-03T03:11:19.584Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 7e5052d402a551ba [peers: [], term: 5, commit: 23200, applied: 0, lastindex: 23200, lastterm: 5]"}
{"level":"info","ts":"2024-09-03T03:11:19.640Z","caller":"mvcc/kvstore.go:393","msg":"kvstore restored","current-rev":36}
{"level":"info","ts":"2024-09-03T03:11:19.655Z","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
{"level":"info","ts":"2024-09-03T03:11:19.672Z","caller":"etcdserver/server.go:854","msg":"starting etcd server","local-member-id":"7e5052d402a551ba","local-server-version":"3.5.6","cluster-version":"to_be_decided"}
{"level":"info","ts":"2024-09-03T03:11:19.673Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba switched to configuration voters=(9101865917446705594)"}
{"level":"info","ts":"2024-09-03T03:11:19.673Z","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"22a17510595a1472","local-member-id":"7e5052d402a551ba","added-peer-id":"7e5052d402a551ba","added-peer-peer-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2380"]}
{"level":"info","ts":"2024-09-03T03:11:19.673Z","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"22a17510595a1472","local-member-id":"7e5052d402a551ba","cluster-version":"3.5"}
{"level":"info","ts":"2024-09-03T03:11:19.673Z","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
{"level":"info","ts":"2024-09-03T03:11:19.674Z","caller":"embed/etcd.go:275","msg":"now serving peer/client/metrics","local-member-id":"7e5052d402a551ba","initial-advertise-peer-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379","http://apisix-etcd.pangu.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[]}
{"level":"info","ts":"2024-09-03T03:11:19.674Z","caller":"etcdserver/server.go:738","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"7e5052d402a551ba","forward-ticks":9,"forward-duration":"900ms","election-ticks":10,"election-timeout":"1s"}
{"level":"info","ts":"2024-09-03T03:11:19.674Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/bitnami/etcd/data/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
{"level":"info","ts":"2024-09-03T03:11:19.674Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/bitnami/etcd/data/member/snap","suffix":"snap","max":5,"interval":"30s"}
{"level":"info","ts":"2024-09-03T03:11:19.674Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/bitnami/etcd/data/member/wal","suffix":"wal","max":5,"interval":"30s"}
{"level":"info","ts":"2024-09-03T03:11:19.677Z","caller":"embed/etcd.go:586","msg":"serving peer traffic","address":"[::]:2380"}
{"level":"info","ts":"2024-09-03T03:11:19.677Z","caller":"embed/etcd.go:558","msg":"cmux::serve","address":"[::]:2380"}
{"level":"info","ts":"2024-09-03T03:11:20.197Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba is starting a new election at term 5"}
{"level":"info","ts":"2024-09-03T03:11:20.198Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became pre-candidate at term 5"}
{"level":"info","ts":"2024-09-03T03:11:20.198Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba received MsgPreVoteResp from 7e5052d402a551ba at term 5"}
{"level":"info","ts":"2024-09-03T03:11:20.198Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became candidate at term 6"}
{"level":"info","ts":"2024-09-03T03:11:20.198Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba received MsgVoteResp from 7e5052d402a551ba at term 6"}
{"level":"info","ts":"2024-09-03T03:11:20.198Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became leader at term 6"}
{"level":"info","ts":"2024-09-03T03:11:20.198Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 7e5052d402a551ba elected leader 7e5052d402a551ba at term 6"}
{"level":"info","ts":"2024-09-03T03:11:20.274Z","caller":"etcdserver/server.go:2054","msg":"published local member to cluster through raft","local-member-id":"7e5052d402a551ba","local-member-attributes":"{Name:apisix-etcd-0 ClientURLs:[http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379 http://apisix-etcd.pangu.svc.cluster.local:2379]}","request-path":"/0/members/7e5052d402a551ba/attributes","cluster-id":"22a17510595a1472","publish-timeout":"7s"}
{"level":"info","ts":"2024-09-03T03:11:20.275Z","caller":"embed/serve.go:100","msg":"ready to serve client requests"}
{"level":"info","ts":"2024-09-03T03:11:20.275Z","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
{"level":"info","ts":"2024-09-03T03:11:20.275Z","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
{"level":"info","ts":"2024-09-03T03:11:20.278Z","caller":"embed/serve.go:146","msg":"serving client traffic insecurely; this is strongly discouraged!","address":"[::]:2379"}
I would like to understand why etcd got stuck in a "failed to publish local member to cluster through raft" loop, and how deleting the data restored normal functionality.
What did you expect to happen?
etcd started normally and is providing services.
How can we reproduce it (as minimally and precisely as possible)?
It's unclear how to reproduce this issue, as in the past, I have also rebooted the host machine without encountering this problem.
Anything else we need to know?
No response
Etcd version (please run commands below)
```console
$ etcd --version
# paste output here
etcd Version: 3.5.6
Git SHA: cecbe35ce
Go Version: go1.16.15
Go OS/Arch: linux/amd64
$ etcdctl version
# paste output here
etcdctl version: 3.5.6
API version: 3.5
```
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
```console
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints= endpoint status -w table
# paste output here
```
Relevant log output
{"level":"info","ts":"2024-09-03T02:19:35.882Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd"]}
{"level":"warn","ts":"2024-09-03T02:19:35.882Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"member_id","data-dir":"/bitnami/etcd/data"}
{"level":"info","ts":"2024-09-03T02:19:35.882Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/bitnami/etcd/data","dir-type":"member"}
{"level":"info","ts":"2024-09-03T02:19:35.882Z","caller":"embed/etcd.go:124","msg":"configuring peer listeners","listen-peer-urls":["http://0.0.0.0:2380"]}
{"level":"info","ts":"2024-09-03T02:19:35.883Z","caller":"embed/etcd.go:132","msg":"configuring client listeners","listen-client-urls":["http://0.0.0.0:2379"]}
{"level":"info","ts":"2024-09-03T02:19:35.884Z","caller":"embed/etcd.go:306","msg":"starting an etcd server","etcd-version":"3.5.6","git-sha":"cecbe35ce","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"apisix-etcd-0","data-dir":"/bitnami/etcd/data","wal-dir":"","wal-dir-dedicated":"","member-dir":"/bitnami/etcd/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379","http://apisix-etcd.pangu.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2024-09-03T02:19:35.887Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/bitnami/etcd/data/member/snap/db","took":"980.479µs"}
{"level":"info","ts":"2024-09-03T02:19:36.057Z","caller":"etcdserver/server.go:509","msg":"recovered v2 store from snapshot","snapshot-index":100001,"snapshot-size":"9.1 kB"}
{"level":"info","ts":"2024-09-03T02:19:36.057Z","caller":"etcdserver/server.go:522","msg":"recovered v3 backend from snapshot","backend-size-bytes":65536,"backend-size":"66 kB","backend-size-in-use-bytes":53248,"backend-size-in-use":"53 kB"}
{"level":"info","ts":"2024-09-03T02:19:36.285Z","caller":"etcdserver/raft.go:529","msg":"restarting local member","cluster-id":"22a17510595a1472","local-member-id":"7e5052d402a551ba","commit-index":135002}
{"level":"info","ts":"2024-09-03T02:19:36.288Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba switched to configuration voters=(9101865917446705594)"}
{"level":"info","ts":"2024-09-03T02:19:36.288Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became follower at term 18"}
{"level":"info","ts":"2024-09-03T02:19:36.288Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 7e5052d402a551ba [peers: [7e5052d402a551ba], term: 18, commit: 135002, applied: 100001, lastindex: 135002, lastterm: 18]"}
{"level":"info","ts":"2024-09-03T02:19:36.289Z","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
{"level":"info","ts":"2024-09-03T02:19:36.289Z","caller":"membership/cluster.go:278","msg":"recovered/added member from store","cluster-id":"22a17510595a1472","local-member-id":"7e5052d402a551ba","recovered-remote-peer-id":"7e5052d402a551ba","recovered-remote-peer-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2380"]}
{"level":"info","ts":"2024-09-03T02:19:36.289Z","caller":"membership/cluster.go:287","msg":"set cluster version from store","cluster-version":"3.5"}
{"level":"info","ts":"2024-09-03T02:19:36.308Z","caller":"mvcc/kvstore.go:393","msg":"kvstore restored","current-rev":36}
{"level":"info","ts":"2024-09-03T02:19:36.322Z","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
{"level":"info","ts":"2024-09-03T02:19:36.338Z","caller":"etcdserver/server.go:845","msg":"starting etcd server","local-member-id":"7e5052d402a551ba","local-server-version":"3.5.6","cluster-id":"22a17510595a1472","cluster-version":"3.5"}
{"level":"info","ts":"2024-09-03T02:19:36.339Z","caller":"etcdserver/server.go:738","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"7e5052d402a551ba","forward-ticks":9,"forward-duration":"900ms","election-ticks":10,"election-timeout":"1s"}
{"level":"info","ts":"2024-09-03T02:19:36.339Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/bitnami/etcd/data/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
{"level":"info","ts":"2024-09-03T02:19:36.339Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/bitnami/etcd/data/member/snap","suffix":"snap","max":5,"interval":"30s"}
{"level":"info","ts":"2024-09-03T02:19:36.339Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/bitnami/etcd/data/member/wal","suffix":"wal","max":5,"interval":"30s"}
{"level":"info","ts":"2024-09-03T02:19:36.344Z","caller":"embed/etcd.go:275","msg":"now serving peer/client/metrics","local-member-id":"7e5052d402a551ba","initial-advertise-peer-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379","http://apisix-etcd.pangu.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[]}
{"level":"info","ts":"2024-09-03T02:19:36.344Z","caller":"embed/etcd.go:586","msg":"serving peer traffic","address":"[::]:2380"}
{"level":"info","ts":"2024-09-03T02:19:36.344Z","caller":"embed/etcd.go:558","msg":"cmux::serve","address":"[::]:2380"}
{"level":"info","ts":"2024-09-03T02:19:36.790Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba is starting a new election at term 18"}
{"level":"info","ts":"2024-09-03T02:19:36.790Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became pre-candidate at term 18"}
{"level":"info","ts":"2024-09-03T02:19:36.790Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba received MsgPreVoteResp from 7e5052d402a551ba at term 18"}
{"level":"info","ts":"2024-09-03T02:19:36.790Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became candidate at term 19"}
{"level":"info","ts":"2024-09-03T02:19:36.790Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba received MsgVoteResp from 7e5052d402a551ba at term 19"}
{"level":"info","ts":"2024-09-03T02:19:36.790Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"7e5052d402a551ba became leader at term 19"}
{"level":"info","ts":"2024-09-03T02:19:36.790Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 7e5052d402a551ba elected leader 7e5052d402a551ba at term 19"}
{"level":"warn","ts":"2024-09-03T02:19:43.340Z","caller":"etcdserver/server.go:2075","msg":"failed to publish local member to cluster through raft","local-member-id":"7e5052d402a551ba","local-member-attributes":"{Name:apisix-etcd-0 ClientURLs:[http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379 http://apisix-etcd.pangu.svc.cluster.local:2379]}","request-path":"/0/members/7e5052d402a551ba/attributes","publish-timeout":"7s","error":"etcdserver: request timed out, possibly due to previous leader failure"}
{"level":"warn","ts":"2024-09-03T02:19:50.341Z","caller":"etcdserver/server.go:2075","msg":"failed to publish local member to cluster through raft","local-member-id":"7e5052d402a551ba","local-member-attributes":"{Name:apisix-etcd-0 ClientURLs:[http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379 http://apisix-etcd.pangu.svc.cluster.local:2379]}","request-path":"/0/members/7e5052d402a551ba/attributes","publish-timeout":"7s","error":"etcdserver: request timed out"}
{"level":"warn","ts":"2024-09-03T02:19:57.343Z","caller":"etcdserver/server.go:2075","msg":"failed to publish local member to cluster through raft","local-member-id":"7e5052d402a551ba","local-member-attributes":"{Name:apisix-etcd-0 ClientURLs:[http://apisix-etcd-0.apisix-etcd-headless.pangu.svc.cluster.local:2379 http://apisix-etcd.pangu.svc.cluster.local:2379]}","request-path":"/0/members/7e5052d402a551ba/attributes","publish-timeout":"7s","error":"etcdserver: request timed out"}
Bug report criteria
What happened?
I deployed a single-node etcd in a single-node k3s, and during normal operation, I used reboot to restart the host machine. After booting up, the etcd startup reported the following error:
The subsequent logs repeatedly print "failed to publish local member to cluster through raft". I have confirmed that the k3s DNS is functioning correctly, and the flannel network is also normal. I cleared the data in /bitnami/etcd/data within etcd, and after restarting etcd, it began to operate normally. The logs are as follows:
I would like to understand why etcd got stuck in a "failed to publish local member to cluster through raft" loop, and how deleting the data restored normal functionality.
What did you expect to happen?
etcd started normally and is providing services.
How can we reproduce it (as minimally and precisely as possible)?
It's unclear how to reproduce this issue, as in the past, I have also rebooted the host machine without encountering this problem.
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output