Open serathius opened 1 year ago
hi @serathius would you be working on this issue or is it open for contribution?
Opening for discussion. @ptabor @ahrtr @jmhbnz
ccing everyone involved with the lease work @xiang90 @gyuho @hexfusion @yichengq @jonboulle @yichengq @heyitsanthony
The lease depends on wall time, if a member's local clock drift, then the lease will be affected. Simply put, all usage of time.Now()
and time.After/time.Before
may get invalid results in that situation when clock drift. So suggest to keep the warning ("prober found high clock drift") as it's for now.
The existing lease has big design issue, we need to think about how to refactor it, which has already been tracked by the roadmap.
The lease depends on wall time, if a member's local clock drift, then the lease will be affected.
I don't think impact of clock drift on wall time matters. By impact on lease I mean consistency issue. Situations where clock drift could cause one member to consider lease expired while other member doesn't.
To clarify by clock drift a small error in physical clock that accumulates over a time. It might be error of 1 second per month that over a year accumulates to couple of minutes. This is negligible for lease wall time calculation as leases are meant to be short, Kubenetes lease for events is unusual because it's for 2h. For 2h lease the wall time impact of clock drift should negligible (below 1 second) which is acceptable.
Clock drift between members would matter if TTL was calculated in exact deadline, not by time difference. For example if we have lease with TTL 1h, it doesn't matter if one member calculates time from 18:00:00 to 19:00:00 while other member is 10 second forward and calculates 18:00:10 to 19:00:10.
@serathius
Two things:
Yes. Clock drifts between different nodes should not affect etcd v3. We designed it that way and that is why all the leases get renewed/extended automatically once there is a leader switch. We can do some handshake and renew with a shorter period to make the lease more accurate even under a voluntary leader switch.
Keeping the clock in sync is still good... It makes debugging 100 times simpler. In any environment with NTP enabled, we should expect a less than 500ms clock difference. So warning on that is probably still OK..... I would not suggest removing it. But really up to the current maintainers since it is not a correctness issue.
Thanks @xiang90 for confirming there is no correctness issue. I can agree that having clock drift in cluster is not great, however this is an external problem to etcd. Many assumptions break with clock drift, take centralized logging, it becomes useless for debugging if logs are collected from nodes with clock drift.
I don't think this is inherently an etcd problem. I had multiple users scared asking me what's the impact on etcd. Will it not perform well? Is the whole cluster consistency is under risk? Overall I think it's not a good idea to try to solve issues that users don't have. Etcd is not a monitoring system, so it should not monitor or alert on clock drift. It just confuses user about why etcd cares about clock drift.
We can warn users about clock drift making etcd debugging hard. We can recommend that users run NTP and monitor their clock drift, even provide a link to external tools, but etcd should not take this responsibility on itself.
hi @serathius, @xiang90
since the log is already of WARN
type, I suppose there is no contribution expected for this issue?
I'm also +1 for removing it. WDYT about adding a timestamp to endpoint/status? That way we can still have some indication when we go through (disconnected) customer logs.
@Aditya-Sood We haven't made decision on how to proceed yet.
WDYT about adding a timestamp to endpoint/status? That way we can still have some indication when we go through (disconnected) customer logs.
Don't think it will help. Clock drift at the moment of request can be totally unrelated to clock drift present at the moment of logs were written.
As confirmed in https://github.com/etcd-io/etcd/issues/16432#issuecomment-1710630587 I would like to repeat the proposal of to remove the prober found high clock drift
log. I think it causes confusion and doesn't help with debugging (more detail in https://github.com/etcd-io/etcd/issues/16432#issuecomment-1711288764)
ping @ahrtr @jmhbnz @wenjiaswe
Hey Team - I've been following this thread and held off commenting as I'm still not fully familiar with the underlying code in question. However to answer the ping above, my vote would be to continue to inform users that clock drift over a certain threshold exists, but ensure this is done in such a way that it is clear there is no impact on etcd consistency.
Situations where clock drift could cause one member to consider lease expired while other member doesn't.
Isn't this a problem caused by clock drift? Especially when the problematic member is a leader.
Overall, I don't think this ticket deserve much time to discuss before the issue I mentioned in https://github.com/etcd-io/etcd/issues/16432#issuecomment-1685912014 is resolved, especially https://github.com/etcd-io/etcd/issues/15247
Situations where clock drift could cause one member to consider lease expired while other member doesn't.
Isn't this a problem caused by clock drift? Especially when the problematic member is a leader.
No, because:
- Decision that lease expires is done by leader and executed via quorum.
FYI. It's being executed by quorum instead of being agreed/consensused by quorum.
Also per your logic, the issue https://github.com/etcd-io/etcd/issues/15247 should NOT happen, because the out of date leader (which gets stuck on writing for a long time) will never get the consensus.
- Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.
Pls do not assume any user cases.
FYI. It's being executed by quorum instead of being agreed/consensused by quorum.
What I meant here is that the decision that lease should be invalidated is made by leader and then proposed to raft.
Also per your logic, the issue https://github.com/etcd-io/etcd/issues/15247 should NOT happen, because the out of date leader (which gets stuck on writing for a long time) will never get the consensus.
No, that's not true. My understanding: (please correct me if any of those points is incorrect)
Issue https://github.com/etcd-io/etcd/issues/15247 is caused by point 2a not properly executed. Old leader doesn't know that it should step down and countdown clock is ticking. This results in old leader executing point 3 even though it shouldn't be. I have pointed out those issues in https://github.com/etcd-io/etcd/issues/15944 long time ago.
Clock drift doesn't influence any of those steps, as cluster only depends on countdown clock on leader. If leader is changed, the TTL is reset. If leader clock is 10 seconds behind other members, it doesn't matter it the time difference it will count will still be TTL.
Clock drift doesn't impact leader decision as leases duration is not long enough to be impacted by it. The longest leases we have (Kubernetes events) have TTL of 2 hours. This is not enough for clock drift to matter.
Pls do not assume any user cases.
We need to make assumptions, especially about leases which were designed for short living leader election tokens. 2 hour leases are already a big problem for Kubernetes due to lack of checkpointing.
Imagine that you generate 1 GB of Kubernetes events within an hour and you have an hour of TTL. Every time there is a leader change (can easily happened multiple times in hour), the TTL will be reset. One leader election you have 2GB, two leader changes, you have 3 GB and so on.
Hi ,
Our Postgresql Patroni cluster is built on 3 etcd. Due to the problem we experienced with NTP, 1 of the Patroni cluster nodes rebooted. Before the reboot, it also giving error as below . Does the NTP problem affect etcd and cause the node to reboot?
Our Etcd VERSION: 3.5.13
Jun 25 14:10:06 lprdendvtb01 adclient[1681]: INFO AUDIT_TRAIL|Centrify Suite|Trusted Path|1.0|2700|Trusted path granted|5|user=lprdendvtb01$@DOM.LOCAL pid=1681 utc=1719313806836 centrifyEventID=23700 DASessID=N/A DAInst=N/A status=GRANTED server=ldap/aplan01.dom.local@DOM.LOCAL
Jun 25 14:10:06 lprdendvtb01 adclient[1681]: WARN
What would you like to be added?
I would like to propose removal of logs
prober found high clock drift
as they are incorrectly implying that etcd is impacted by clock drift.To my knowledge, there is no impact of clock difference on etcd version 3.
Raft itself doesn't not depend on time in any way. It measures time passage for things like health probes, but doesn't compare time between members. The only part of etcd that could be impacted is Leases, see https://github.com/etcd-io/etcd/issues/9768#issuecomment-391564180.
The connectivity monitor that reports the time drift was introduced for v2 etcd (https://github.com/etcd-io/etcd/pull/3210). In v3 etcd leases were rewritten to depend on time difference, thus should not be affected. https://github.com/etcd-io/etcd/pull/3834 Leases also use monotonic time (https://github.com/etcd-io/etcd/pull/6888, https://github.com/etcd-io/etcd/pull/8507) meaning time changes should not impact ttl.
I expect the connectivity monitor stayed due to etcd v3.5 still officially supporting v2 API. Next release v3.6 removes v2 API, so we can remove the clock drift detection too.
Please let me know if you are aware of any place that etcd could be impacted by clock drift.
Why is this needed?
Prevents user confusion about clock drift impact on etcd.