coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

systemd-timesyncd not as precise as ntpd #391

Open croemmich opened 9 years ago

croemmich commented 9 years ago

I run Deis on CoreOS and recently made the switch to 681, which swapped ntpd for systemd-timesyncd as the default time sync daemon. Deis uses a Ceph to create HA filesystem within the cluster. Ceph has an expectation that times within the cluster are very close, see: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#clock-skews.

Before the switch, Ceph never reported any issues, but after the switch 2/3 of my Ceph monitors were reporting clock skew issues. I verified that systemd-timesyncd was indeed running, but I couldn't find any indication of when/how it was syncing.

Is there a difference in the way systemd-timesyncd works, does it sync less frequently, is it just less accurate, or is there something I need to configure to get the nodes more in sync?

marineam commented 9 years ago

Well, unlike ntpd timesyncd will only use one upstream server instead of ntpd's 4. You can try configuring timesyncd to use a local ntp server if you aren't already or if using a remote server pick a single one to use for all your nodes, maybe the issue is simply syncing with different reference times. Beyond that I'm not sure. We do still ship ntpd so you can switch back too.

https://coreos.com/docs/cluster-management/setup/configuring-date-and-timezone/

croemmich commented 9 years ago

Thanks @marineam, I'll give your suggestions a try. I have already switched back to ntpd which solved the issues, but if timesyncd is the future of CoreOS, I'd like to get it working. However, it would be ideal if it functioned the same without additional user configuration.

marineam commented 9 years ago

Thanks. One advantage of timesyncd is being integrated with networkd so if dhcp provides a local ntp server it will be used. The issue may simply be a bug in timesyncd, it is relatively new code. If it turns out the issue is due to only using a single remote timeserver out of the default pool we may need to revisit the choice of timesyncd.

croemmich commented 9 years ago

@marineam I'd prefer to not have to deploy a local ntp server. I tried specifying a single remote server for all of the nodes and they still fell out of sync. Regardless, neither option sits well with me, as my current deployment is completely HA and using a single ntp server breaks that.

marineam commented 9 years ago

Ok, I would recommend switching back to ntpd for now then. I'll dig more to see if timesyncd can be improved or if we need to change the default again. I promise ntpd will remain in the image. :)

https://coreos.com/docs/cluster-management/setup/configuring-date-and-timezone/

croemmich commented 9 years ago

Cool, thanks!

sitsofe commented 9 years ago

(You can check if systemd-timesyncd has synchronized (assuming you are actually using it and not some other time sync server that implements the timedate DBUS interface such as chrony) by looking at the NTP synchronized: line of the timedatectl output)

crawford commented 8 years ago

@croemmich have you seen any improvement in the later versions of CoreOS?

crawford commented 8 years ago

Closing due to inactivity.

ramayer commented 7 years ago

Closing due to inactivity.

If more activity is created, would this be re-opened?

bgilbert commented 7 years ago

@ramayer If this is still causing problems, we can reopen. What behavior are you seeing?

nealey commented 5 years ago

@bgilbert I have been running ceph on CoreOS stable for over a year now and this problem has never gone away. I just now found this issue in a web search. I can provide whatever debugging would be helpful.

ramayer commented 5 years ago

As @croemmich described, systemd-timesyncd does not seem to set clocks correctly.

Like @croemmich I also switched back to ntpd (which made the problem go away) when I found this page with his workaround.