etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.58k stars 9.75k forks source link

Etcd loses track of leases #14019

Closed cjbottaro closed 1 year ago

cjbottaro commented 2 years ago

What happened?

Etcd thinks these leases still exist...

%{
   header: %{
     cluster_id: 8855822249941472443,
     member_id: 16980875726527512478,
     raft_term: 252,
     revision: 16834726
   },
   leases: [
     %{ID: 3492964341253159182},
     %{ID: 4584242825964808783},
     %{ID: 362681125277541271},
     %{ID: 3492964341253159049},
     %{ID: 362681125277541277},
     %{ID: 362681125277541327},
     %{ID: 362681125277541281},
     %{ID: 362681125277541309},
     %{ID: 3492964341253159261},
     %{ID: 362681125277541235},
     %{ID: 362681125277541359},
     %{ID: 362681125277541357},
     %{ID: 362681125277541263},
     %{ID: 362681125277541267},
     %{ID: 362681125277541319},
     %{ID: 3492964341253159252},
     %{ID: 4584242825964808733},
...

Each lease has a ttl of 5s. We shutdown our system (no new leases created) and these leases still exist over 24h later.

Trying to revoke any of these leases results in an error saying they don't exist.

:eetcd_lease.revoke(Etcd, 3492964341253159182)
{:error,
 {:grpc_error,
  %{"grpc-message": "etcdserver: requested lease not found", "grpc-status": 5}}}

We had to shutdown our system because asking Etcd for a lease would result in a timeout. After a few hours of our system being shutdown, Etcd stopped timing out when asking for a lease.

What did you expect to happen?

Etcd to continue to grant leases under heavy load and not lose track of leases.

How can we reproduce it (as minimally and precisely as possible)?

Locking/unlocking unique locks at a rate of 80 per second, each using a lease with ttl=5. Letting that run for a day or so.

Anything else we need to know?

We use Etcd for distributed locking and nothing else. Each lease is 5s. We have a 3 node cluster running on a single Core i3-8100 machine. We are only requesting about 80 locks per second.

--auto-compaction-retention=5m

It seems this setting is reasonable given that our locks have a ttl of 5s.

Etcd version (please run commands below)

quay.io/coreos/etcd:v3.5.3

Etcd configuration (command line flags or environment variables)

--auto-compaction-retention=5m

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ kubectl exec etcd-0-6776dd499d-k92mn -- etcdctl member list -w table
+------------------+---------+--------+--------------------+--------------------+------------+
|        ID        | STATUS  |  NAME  |     PEER ADDRS     |    CLIENT ADDRS    | IS LEARNER |
+------------------+---------+--------+--------------------+--------------------+------------+
| 554ff50439e23079 | started | etcd-0 | http://etcd-0:2380 | http://etcd-0:2379 |      false |
| 5ba2a9fe8d840508 | started | etcd-2 | http://etcd-2:2380 | http://etcd-2:2379 |      false |
| eba8308136b9bf9e | started | etcd-1 | http://etcd-1:2380 | http://etcd-1:2379 |      false |
+------------------+---------+--------+--------------------+--------------------+------------+

Relevant log output

No response

ahrtr commented 2 years ago

Probably you are running into 13205. Could you try to reproduce this issue using the latest code in release-3.5 or main branches.

Please also provide the following info so that others can take a closer look.

  1. The complete etcd command-line configuration;
  2. The detailed steps to reproduce this issue.
ahrtr commented 2 years ago

@cjbottaro any update on this? Have you enabled auth in the cluster?

cjbottaro commented 2 years ago

Ahh, I stopped using Etcd for locks and went back to using single node Redis for now. Will come back to Etcd if the need for HA ever arises though.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.