Closed AlexDCraig closed 3 years ago
I have the same in West Europe/North Europe region on two clusters kubernetes version is 1.14.8
We have the same problem in UK South. Kubernetes cluster 1.14.8 on 3 clusters and all of them report "clock skew detected for node(s):"
We're also frequently experiencing this in East Asia, however not on any of our South Central US clusters
Seeing the same problem in east-us (single cluster, 4 nodes in one node pool). This started occurring after upgrading to 1.16.7 (from 1.15.5). Seems to flap fairly frequently.
Same here. wus-2, 1.15.10. And I guess this is not caused by AKS but Linux kernel or lower layer.
I'm watching my Linux instances on Azure VM with node_exporter. The issue is caused there also.
This is happening to our AKS clusters as well but not our on-prem Rancher K8s clusters.
Action required from @Azure/aks-pm
Action required from @Azure/aks-pm
We're looking into this with the Azure Linux team in order to improve time sync reliability. I'll post here as soon as we have findings.
CC @juan-lee @xuto2 @qike-ms
Are there way any mitigation steps?
Action required from @palma21.
You can use a Daemon set to change the timesync servers in case that helps.
We found a few cases where the ubuntu default servers might not be reliable across the world so we'll be moving to host sync.
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.
Any quick remediation / command to force a resync ? We are currently affected on UKSOUTH
I have same problem in WESTEU, maybe this can help troubleshooting process:
azureuser@aks-nodepool1-vmss0:~$ sudo service systemd-timesyncd status
● systemd-timesyncd.service - Network Time Synchronization
Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/systemd-timesyncd.service.d
└─disable-with-time-daemon.conf
Active: inactive (dead)
Condition: start condition failed at Mon 2020-11-16 10:56:27 UTC; 18s ago
ConditionFileIsExecutable=!/usr/sbin/chronyd was not met
Docs: man:systemd-timesyncd.service(8)
PS: @djsly I found a workaround, run this command on each node of the cluster:
sudo service chrony start
sudo chronyd -q 'server 0.europe.pool.ntp.org iburst'
Can you please share you Vmss Os disk image version ?
There was a known issue with 2020.10.28 which was patch afterward. You might just need to recreate your agent pool and you should be all set.
On Nov 16, 2020, at 6:01 AM, ffais notifications@github.com wrote:
I have same problem in WESTEU, maybe this can help troubleshooting process:
azureuser@aks-nodepool1-vmss0:~$ sudo service systemd-timesyncd status ● systemd-timesyncd.service - Network Time Synchronization Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: enabled) Drop-In: /lib/systemd/system/systemd-timesyncd.service.d └─disable-with-time-daemon.conf Active: inactive (dead) Condition: start condition failed at Mon 2020-11-16 10:56:27 UTC; 18s ago ConditionFileIsExecutable=!/usr/sbin/chronyd was not met Docs: man:systemd-timesyncd.service(8)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
This is the version actually in use 2020.10.28.
Replace your node pool it should fix it
On Nov 16, 2020, at 8:42 AM, ffais notifications@github.com wrote:
This is the version actually in use 2020.10.28.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
or do a node image upgrade to the latest vhd version 2020.11.11 when it's ready in this week's release.
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.
This issue will now be closed because it hasn't had any activity for 15 days after stale. AlexDHoffer feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.
Host-sync and chrony are now used from 2021-03-08 release
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Here is a picture of the clock skew estimates across the three environments in question:
The legend has been omitted for confidentiality reasons.
We also host the same service set in the west-europe region, and have not seen clock skew impact resources in that region.
Environment:
kubectl version
): 1.15.5