etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.69k stars 9.75k forks source link

The key is not deleted when the bound lease expires #10686

Closed nolouch closed 5 years ago

nolouch commented 5 years ago

ETCD Version: master We occasionally encounter the same problem as https://github.com/etcd-io/etcd/issues/9935. As the origin issue says:

2018/07/17 21:19:47.943 log.go:82: [warning] etcdserver: [failed to revoke 093a64a83ac42057 ("context deadline exceeded"

If failed to revoke the lease, etcd will still keep this lease in leaseMap and no will try to delete it later. Like this:

bin git:(e68285b) ✗ ETCDCTL_API=3 etcdctl --endpoints=ip lease timetolive 93a64a83ac42057
lease 093a64a83ac42057 granted with TTL(60s), remaining(-68272s)

and the main problem may be the leaseQueue pop the expired lease and then fail to revoke the expired lease, so we will no longer to find this lease in the queue and retry to revoke it later.

// expireExists returns true if expiry items exist.
// It pops only when expiry item exists.
// "next" is true, to indicate that it may exist in next attempt.
func (le *lessor) expireExists() (l *Lease, ok bool, next bool) {
    if le.leaseHeap.Len() == 0 {
        return nil, false, false
    }

    item := le.leaseHeap[0]
    l = le.leaseMap[item.id]
    if l == nil {
        // lease has expired or been revoked
        // no need to revoke (nothing is expiry)
        heap.Pop(&le.leaseHeap) // O(log N)
        return nil, false, true
    }

    if time.Now().UnixNano() < item.time /* expiration time */ {
        // Candidate expirations are caught up, reinsert this item
        // and no need to revoke (nothing is expiry)
        return l, false, false
    }
    // if the lease is actually expired, add to the removal list. If it is not expired, we can ignore it because another entry will have been inserted into the heap

    heap.Pop(&le.leaseHeap) // O(log N)
    return l, true, false
}

cc @gyuho @xiang90 @jingyih

nolouch commented 5 years ago

I have injected a failpoint when got the expired lease, likes:

func (le *lessor) revokeExpiredLeases() {
    var ls []*Lease

    // rate limit
    revokeLimit := leaseRevokeRate / 2

    le.mu.RLock()
    if le.isPrimary() {
        ls = le.findExpiredLeases(revokeLimit)
    }
    le.mu.RUnlock()
    if len(ls) != 0 {
        // gofail: var beforeRevokeExipredLease bool
        //if beforeRevokeExipredLease {
        //  if le.lg != nil {
        //      le.lg.Info("got expired lease",
        //          zap.Any("leases", ls),
        //      )
        //  }
        //  return
        //}
        select {
        case <-le.stopC:
            return
        case le.expiredC <- ls:
        default:
            // the receiver of expiredC is probably busy handling
            // other stuff
            // let's try this next time after 500ms
        }
    }
}

and you can try this script to reproduce it.

#!/bin/bash
sudo docker pull nolouch/etcd-lease
sudo docker run -d --name lease-test -p 2379:2379 -p 2380:2380 -p 1234:1234  nolouch/etcd-lease

#test1
echo "Testing without failpoint"
curl http://127.0.0.1:1234/go.etcd.io/etcd/lease/beforeRevokeExipredLease -XDELETE
lease=$(sudo docker exec lease-test /etcdctl lease grant 3| awk '{print $2}')
sudo docker exec lease-test /etcdctl put k1 "v1" --lease=$lease
for i in {1..5}
do
    sudo docker exec lease-test /etcdctl lease timetolive $lease --keys
    sleep 1
done

#test2
echo "Testing with failpoint"
lease=$(sudo docker exec lease-test /etcdctl lease grant 3| awk '{print $2}')
sudo docker exec lease-test /etcdctl put k1 "v1" --lease=$lease
curl http://127.0.0.1:1234/go.etcd.io/etcd/lease/beforeRevokeExipredLease -XPUT -d"return(true)"
for i in {1..60}
do
    sudo docker exec lease-test /etcdctl lease timetolive $lease --keys
    sleep 1
    if [ $i -eq 5 ]
    then
        echo "Delete failpoint"
        curl http://127.0.0.1:1234/go.etcd.io/etcd/lease/beforeRevokeExipredLease -XDELETE
    fi
done

and the result is:

➜  etcd_test ./test.sh 
Using default tag: latest
latest: Pulling from nolouch/etcd-lease
5d20c808ce19: Already exists 
9394a16e8a66: Pull complete 
0c454d66fd46: Pull complete 
dfdab3df7b9a: Pull complete 
Digest: sha256:3fdde4fd4a704dd4cd75b8d46c5911e0b927f18522627cf4f163f77cb2694a0a
Status: Downloaded newer image for nolouch/etcd-lease:latest
49d8f4b957a14d75016a6aaf4c75679fa26f6d0e1c24d11948b5755f023a0876
========================================================================
Testing without failpoint
failed to delete failpoint failpoint: failpoint is disabled
OK
lease 694d6a684bb3cb05 granted with TTL(3s), remaining(2s), attached keys([k1])
lease 694d6a684bb3cb05 granted with TTL(3s), remaining(1s), attached keys([k1])
lease 694d6a684bb3cb05 granted with TTL(3s), remaining(0s), attached keys([k1])
lease 694d6a684bb3cb05 already expired
lease 694d6a684bb3cb05 already expired
========================================================================
Testing with failpoint
OK
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(2s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(1s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(0s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(0s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-1s), attached keys([k1])
Delete failpoint
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-3s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-4s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-5s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-6s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-7s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-9s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-10s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-11s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-12s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-13s), attached keys([k1])
lease 694d6a684bb3cb09 granted with TTL(3s), remaining(-15s), attached keys([k1])
j2gg0s commented 5 years ago

As a new comer of etcd, I can't guarantee that my opinion is completely correct.

Lessor pop expired lease from leaseHeap and send to expiredC. However, receiver of expiredC does not guarantee revoke lease success, Even, lessor don't guarantee send to expiredC success, no warning log under these circumstances?

lessor pop and send to expiredC

func (le *lessor) revokeExpiredLeases() {
    var ls []*Lease

    // rate limit
    revokeLimit := leaseRevokeRate / 2

    le.mu.RLock()
    if le.isPrimary() {
        ls = le.findExpiredLeases(revokeLimit)
    }
    le.mu.RUnlock()

    if len(ls) != 0 {
        select {
        case <-le.stopC:
            return
        case le.expiredC <- ls:
        default:
            // the receiver of expiredC is probably busy handling
            // other stuff
            // let's try this next time after 500ms
        }
    }
}

receiver of expiredC

case leases := <-expiredLeaseC:
    s.goAttach(func() {
        // Increases throughput of expired leases deletion process through parallelization
        c := make(chan struct{}, maxPendingRevokes)
        for _, lease := range leases {
            select {
            case c <- struct{}{}:
            case <-s.stopping:
                return
            }
            lid := lease.ID
            s.goAttach(func() {
                ctx := s.authStore.WithRoot(s.ctx)
                _, lerr := s.LeaseRevoke(ctx, &pb.LeaseRevokeRequest{ID: int64(lid)})
                if lerr == nil {
                    leaseExpired.Inc()
                } else {
                    if lg != nil {
                        lg.Warn(
                            "failed to revoke lease",
                            zap.String("lease-id", fmt.Sprintf("%016x", lid)),
                            zap.Error(lerr),
                        )
                    } else {
                        plog.Warningf("failed to revoke %016x (%q)", lid, lerr.Error())
                    }
                }

                <-c
            })
        }
    })

Maybe, sender of expiredC should record leases that have expired but are being revoke. And check these leases in next runLoop.

nolouch commented 5 years ago

@j2gg0s Yes, I think there is no guarantee.

jpbetz commented 5 years ago

Just to confirm, this issue exists on master only? Not 3.3 and earlier since the heap was introduced after they were released?

jingyih commented 5 years ago

Appears to me that this issue exists in master branch only.

Ref: the benchmark result with and without the heap: https://github.com/etcd-io/etcd/pull/9418#issuecomment-376672668

@nolouch Maybe use the same benchmark tests for #10693?

nolouch commented 5 years ago

Yes, we use the master.

mitake commented 5 years ago

Hi @nolouch , thanks a lot for the detailed testing and fix in https://github.com/etcd-io/etcd/pull/10693 !

Just curious: did you find the problem in long living cluster which caused connection failures or leader elections? I'm guessing the essential root cause would be that the management of lease lifetime depends on physical clock progress of a leader node and some states like leaseHeap aren't replicated with Raft.

It might be a good example of jepsen test planned in https://github.com/etcd-io/maintainers/issues/3 but not fully sure.

nolouch commented 5 years ago

@mitake The problem may be the network problem if propose the revokeLease failed. and I do not check if there is a leader election when the problem happens, but if the leader changed, the new leader will Promote the lessor to be primary and rebuild the leaseHeap, so leader changed will retry to revoke the lease if lease did not delete on the disk.