etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.37k stars 9.72k forks source link

All or some etcd gateways shut down when one etcd gateway shuts down #15430

Open WilliamDEdwards opened 1 year ago

WilliamDEdwards commented 1 year ago

What happened?

One of the following issues occurs:

This happens on most stops and restarts, but not every. So far, there is no visible pattern.

I run an etcd cluster with the following nodes:

Each cluster node also runs an etcd gateway.

Example:

At 11:51:09, I stopped the etcd gateway on etcd2:

Mar 08 11:51:09 etcd2.cyberfusion.cloud systemd[1]: Stopping etcd gateway...
Mar 08 11:51:09 etcd2.cyberfusion.cloud systemd[1]: etcd-gateway.service: Succeeded.
Mar 08 11:51:09 etcd2.cyberfusion.cloud systemd[1]: Stopped etcd gateway.

At exactly this moment (11:51:09), etcd-gateway also stops on etcd1. 1 second later (11:51:10), etcd-gateway also stops on etcd0.

What did you expect to happen?

When an etcd gateway is shut down, other etcd gateways should not be shut down.

How can we reproduce it (as minimally and precisely as possible)?

See 'What happened?'.

Anything else we need to know?

No response

Etcd version (please run commands below)

```console $ etcd --version etcd Version: 3.3.25 Git SHA: Not provided (use ./build instead of go build) Go Version: go1.15.9 Go OS/Arch: linux/amd64 $ etcdctl version etcdctl version: 3.3.25 API version: 3.3 ```

Etcd configuration (command line flags or environment variables)

``` root@etcd1:~# cat /etc/default/etcd-gateway # Must specify port 2379, see: https://github.com/etcd-io/etcd/issues/15144 # Must use node name instead of IP address, see: https://github.com/etcd-io/etcd/issues/15143 ENDPOINTS="etcd0.cyberfusion.cloud:2379,etcd1.cyberfusion.cloud:2379,etcd2.cyberfusion.cloud:2379" LISTEN_HOST=0.0.0.0 LISTEN_PORT=23790 root@etcd1:~# cat /etc/systemd/system/etcd-gateway.service [Unit] Description=etcd gateway After=network.target etcd.service [Install] WantedBy=multi-user.target [Service] Type=notify EnvironmentFile=/etc/default/%p ExecStart=/usr/bin/etcd gateway start --endpoints=${ENDPOINTS} --listen-addr=${LISTEN_HOST}:${LISTEN_PORT} User=etcd Group=etcd Restart=on-failure RestartSec=60 ```

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

N/A

Relevant log output

etcd gateway does not log anything when shutting down. It simply exits. This is the full output of an etcd gateway (running in the foreground) from start to faulty stop: ``` root@etcd2:~# /usr/bin/etcd gateway start --endpoints=${ENDPOINTS} --listen-addr=${LISTEN_HOST}:${LISTEN_PORT} 2023-03-08 11:51:28.370787 E | etcdmain: forgot to set Type=notify in systemd service file? 2023-03-08 11:51:28.370973 I | proxy/tcpproxy: ready to proxy client requests to [etcd0.cyberfusion.cloud:2379 etcd1.cyberfusion.cloud:2379 etcd2.cyberfusion.cloud:2379] root@etcd2:~# ``` This happens both when daemonized (using systemd) and when running in the foreground.
ahrtr commented 1 year ago

3.3 is out of support. Could you please check whether this is reproducible in 3.5 or main branch?

WilliamDEdwards commented 1 year ago

3.3 is out of support.

Perhaps it would be useful to get supported versions into Debian backports.

Could you please check whether this is reproducible in 3.5 or main branch?

No, I will not be able to anytime soon.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.