containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.01k stars 2.35k forks source link

OS X, podman machine time stops sometime #11541

Closed dm3ch closed 1 year ago

dm3ch commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description Date output is wrong for both containers and podman machine itself on OS X.

Steps to reproduce the issue: I'm not sure 100% in my reproduction guide

  1. Install podman on OSX
  2. Create podman machine
  3. Wait couple of days
  4. Get date from podman machine

Describe the results you received: OS X date:

$ date
Sun Sep 12 15:55:30 MSK 202

Podman container and podman machine ssh date:

$ date
Fri Sep 10 19:47:08 UTC 2021

I have ran same command again and time was completely the same. After stoping and starting machine again time started to go.

Describe the results you expected: Date inside podman machine should be right

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md) Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Client:
Version:      3.3.1
API Version:  3.3.1
Go Version:   go1.17
Built:        Mon Aug 30 22:15:26 2021
OS/Arch:      darwin/amd64

Server:
Version:      3.3.1
API Version:  3.3.1
Go Version:   go1.16.6
Built:        Mon Aug 30 23:46:36 2021
OS/Arch:      linux/amd64

OS X info: Screenshot 2021-09-12 at 16 00 11

afbjorklund commented 3 years ago

I think KVM does this (~ntp~) in "hardware", but it seems to be missing from HVF ?

Might have to run an ntpd in the VM, before it is supported by the virtualization

EDIT: I think I got RTC and NTP confused in my head there.

[    0.249404] PM: RTC time: 17:44:05, date: 2021-09-13
[    0.769992] rtc_cmos 00:00: RTC can wake from S4
[    0.773422] rtc_cmos 00:00: registered as rtc0
[    0.774279] rtc_cmos 00:00: setting system clock to 2021-09-13T17:44:06 UTC (1631555046)
[    0.775629] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
guillaumerose commented 3 years ago

There is a long blog post about this on the Docker blog. https://www.docker.com/blog/addressing-time-drift-in-docker-desktop-for-mac/

Docker Desktop runs an embedded NTP server on the host. It takes his source on the system time. A NTP client in the VM keeps it in sync.

dm3ch commented 3 years ago

JFYI: Maybe my case is connected with that, but in my case it wasn't a slowdown because time completely stoped.

But it looks like NTP hack could workaround this particular issue too

rhatdan commented 2 years ago

@dustymabe Is this something we should turn on in Fedora CoreOS?

rhatdan commented 2 years ago

@ashley-cui PTAL

dustymabe commented 2 years ago

hmm.. NTP should be utilized on Fedora CoreOS by default

What's the output of the following commands run on the Fedora CoreOS host?

afbjorklund commented 2 years ago

My "podman machine" CoreOS seems synced to 2.fedora.pool.ntp.org

guillaumerose commented 2 years ago

I think podman machine should be self-contained and not require an Internet access. What if you code in the plane? The time should not drift.

guillaumerose commented 2 years ago

This situation can also happen after suspend/resume of the system. See a solution here: https://github.com/linuxkit/linuxkit/tree/master/pkg/host-timesync-daemon

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 2 years ago

@dm3ch @baude @ashley-cui Is this still a bug?

ashley-cui commented 2 years ago

I think so

rhatdan commented 2 years ago

Should we grab the code from https://github.com/linuxkit/linuxkit/commit/d24d0bd559ad756314ecafb1d0ea2d6d4aef31ae and add it to podman system service. So that if this device exists we could grab the latest time and set it when podman service starts up.

rhatdan commented 2 years ago

@guillaumerose @dustymabe WDYT?

vrothberg commented 2 years ago

@guillaumerose @dustymabe friendly ping

dustymabe commented 2 years ago

I'm no expert here. So mac/windows don't have a good way to keep VM clocks in sync with the host clock but the code in https://github.com/linuxkit/linuxkit/commit/d24d0bd559ad756314ecafb1d0ea2d6d4aef31ae knows how to extract that information from the hypervisor and apply it to the VM?

If there is a way to extract the host time from the guest I'm surprised linux (or some builtin userspace daemon) doesn't already know how to get that information. Is there any open issues against some builtin linux components that discuss this issue?

fingon commented 2 years ago

This is still an issue, an example from a podman machine started early yesterday:

[core@localhost ~]$ timedatectl
               Local time: Tue 2021-11-30 19:48:01 UTC
           Universal time: Tue 2021-11-30 19:48:01 UTC
                 RTC time: Wed 2021-12-01 03:58:20
                Time zone: UTC (UTC, +0000)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no
..
mstenber@kobuta ~>TZ=UTC date
Wed Dec  1 05:38:01 UTC 2021

Funnily enough even the 'RTC' is lagging but not as much as the system time. systemd's ntp thing is clearly not working, I'm not sure why (it is supposed to have sane defaults, and running ntpdate inside container works, but cannot set time due to permissions I guess).

NOTE: The local time roughly correlates to not having moved when machine was suspended; RTC time lag I have no idea, I guess it just doesn't get updated from the host.

After sudo reboot it is fine, and I presume it will be too until if I suspend it again:

[core@localhost ~]$ timedatectl
               Local time: Wed 2021-12-01 05:48:52 UTC
           Universal time: Wed 2021-12-01 05:48:52 UTC
                 RTC time: Wed 2021-12-01 05:48:53
                Time zone: UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no
rhatdan commented 2 years ago

Run it privileged, and it should work.

acdha commented 2 years ago

I just ran into this and strongly suspect it's related to system sleep events, which should be easier to trigger at wake in the VM, since the clock hardware on the host has very low drift and I saw a ~3.5 hour lag in the last 6 wall-clock hours.

One factor which is almost certainly exacerbating this: I'm on a network which firewalls arbitrary NTP and podman does not pass through the host's NTP /etc/ntp.conf configuration. I'm not sure it'd be worth spending time on that rather than improving synchronization from the host.

fingon commented 2 years ago

Run it privileged, and it should work.

How? At least with podman machine, it does not apparently help.

mstenber@kobuta ~>podman run --privileged -it fedora:latest
...
[root@af04e3c8dbd3 /]# sudo dnf install -y ntpsec
...
[root@af04e3c8dbd3 /]# ntpdate 0.pool.ntp.org
{"time":"2021-12-02T09:18:19.836900+0000","offset":-0.000026,"precision":0.009240,"host":"0.pool.ntp.org","ip":"95.216.150.202","stratum":2,"leap":"no-leap","adjusted":false}
CLOCK: adj_systime: Operation not permitted
[root@af04e3c8dbd3 /]# 
rhatdan commented 2 years ago

It does require a rootfull containers. Rootless users are not allowed to adjust machine time.

konstruktoid commented 2 years ago

Is there anyway to easily configure the VM? No real need to install additional services as one possible issue is to add NTP servers to the systemd-timesyncd service. Will report back after the weekend if this hasn't worked.

[core@localhost ~]$ sudo sed -i 's/#NTP=.*/NTP=0.fedora.pool.ntp.org 1.fedora.pool.ntp.org/g' /etc/systemd/timesyncd.conf 
[core@localhost ~]$ grep -v '^#' /etc/systemd/timesyncd.conf 

[Time]
NTP=0.fedora.pool.ntp.org 1.fedora.pool.ntp.org
[core@localhost ~]$ sudo systemctl restart systemd-timesyncd.service 

This of course won't help if there's firewalls blocking stuff, as in https://github.com/containers/podman/issues/11541#issuecomment-984179128

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 2 years ago

@ashley-cui any update?

arixmkii commented 2 years ago

In MacOS I do this for rootless containers

podman machine ssh

And then in the machine shell

sudo -i
hwclock --hctosys

May be this could become an optional auto apply script in the desktop app companion for MacOS. Executed, when OS resumes from sleep/suspend.

rhatdan commented 2 years ago

@flouthoc @baude @ashley-cui Should we do this in the ignition?

baude commented 2 years ago

seems reasonable @rhatdan

MB175 commented 2 years ago

When I start a container it instantly jumps to a uptime of 30 Minutes, I suppose this isn't normal aswell ?

dustymabe commented 2 years ago

Should we do this in the ignition?

Note that Ignition only runs once on first machine boot. If you want this to apply dynamically at various times during a machine's lifetime then you'd need to write a systemd unit with the smarts for that and deliver it via Ignition.

You probably knew all this already, just wanted to make sure.

baude commented 2 years ago

no chrony or likewide on fcos?

rhatdan commented 2 years ago

Why not add a service that runs the command once at boot time.

dustymabe commented 2 years ago

no chrony or likewide on fcos?

Yep. If cron functionality is what you want I suggest using systemd timers.

Why not add a service that runs the command once at boot time.

A systemd service that did that should work fine but I'm not 100% sure on the exact problem here. There is the guest (FCOS) and the host (Mac OSX). When you suspend OSX and there was a podman machine running (FCOS) what happens to that FCOS machine? Does it think it's been running the whole time? Does it get suspended/resumed too? Either way I assume that it doesn't go through a full boot cycle so a service that runs once at boot time probably won't suffice.

fingon commented 2 years ago

Why not add a service that runs the command once at boot time. A systemd service that did that should work fine but I'm not 100% sure on the exact problem here. There is the guest (FCOS) and the host (Mac OSX). When you suspend OSX and there was a podman machine running (FCOS) what happens to that FCOS machine? Does it think it's been running the whole time? Does it get suspended/resumed too? Either way I assume that it doesn't go through a full boot cycle so a service that runs once at boot time probably won't suffice.

It gets suspended and resumed. Host updates its time (presumably from RTC), but the FCOS is suddenly hours/days in the past.

Due to that, ideal solution would be something that would trigger from the resume, but I'm not sure if that is visible within qemu, and failing that, something that kicks ntp every now and then would work but is quite ugly solution.

baude commented 2 years ago

chrony is installed and activated on fcos when running podman machine. it might catch the time difference in a reasonable amount of time but it is not likely. if you google about this problem with qemu, it is quite common with no super solid solution being found.

on the host, we could listen on the qmp socket and scan for a resume event and then take some form of action, but this comes with problems too ... mainly that upon detection, action would need to be taken in some automated way, which usually smells like long term problems.

baude commented 2 years ago

maybe podman desktop could do this?

dustymabe commented 2 years ago

It gets suspended and resumed. Host updates its time (presumably from RTC), but the FCOS is suddenly hours/days in the past.

So you're saying FCOS goes through a full suspend/resume cycle? If that's true then I think there are ways to trigger this using systemd units https://unix.stackexchange.com/questions/124212/writing-a-systemd-service-to-be-executed-at-resume

baude commented 2 years ago

can someone try that ^^^ ?

konstruktoid commented 2 years ago

We'll see how it goes.

[Unit]
Description=Set clock after resume
After=suspend.target

[Service]
Type=oneshot
ExecStart=/usr/sbin/hwclock --hctosys

[Install]
WantedBy=suspend.target
[root@localhost systemd]# systemctl enable hwc-resume.service vice
[root@localhost systemd]# systemctl stop hwc-resume.service vice
[root@localhost systemd]# systemctl start hwc-resume.service 
[root@localhost systemd]# systemctl status hwc-resume.service 
○ hwc-resume.service - Set clock after resume
     Loaded: loaded (/etc/systemd/system/hwc-resume.service; enabled; vendor preset: disabled)
     Active: inactive (dead)
Jan 18 20:34:01 localhost.localdomain systemd[1]: Starting Set clock after resume...
Jan 18 20:34:02 localhost.localdomain systemd[1]: hwc-resume.service: Deactivated successfully.
Jan 18 20:34:02 localhost.localdomain systemd[1]: Finished Set clock after resume.
Jan 18 20:35:07 localhost.localdomain systemd[1]: Starting Set clock after resume...
Jan 18 20:35:08 localhost.localdomain systemd[1]: hwc-resume.service: Deactivated successfully.
Jan 18 20:35:08 localhost.localdomain systemd[1]: Finished Set clock after resume.
dustymabe commented 2 years ago

@konstruktoid - does your output imply that it was successful?

konstruktoid commented 2 years ago

sorry @dustymabe that was just me showing how I enabled the service.

i've been running the machine and the service the last 12 hours on my laptop (on battery) and letting it go to sleep, close lid and so on and so forth, but I haven't been able to trigger the podman machine to suspend or resume.

konstruktoid commented 2 years ago

Not working. Running both with timesyncd and the resume service. It's very out-of-sync.

Sat Jan 22 03:10:21 UTC 2022
up 1 day, 14 hours, 55 minutes

Mon Jan 24 09:21:22 UTC 2022
[core@localhost ~]$ grep -v '#' /etc/systemd/timesyncd.conf 

[Time]
NTP=0.fedora.pool.ntp.org 1.fedora.pool.ntp.org
[core@localhost ~]$ sudo journalctl -r -u hwc-resume
-- Journal begins at Tue 2022-01-11 15:19:18 UTC, ends at Sat 2022-01-22 03:10:15 UTC. --
Jan 18 20:35:08 localhost.localdomain systemd[1]: Finished Set clock after resume.
Jan 18 20:35:08 localhost.localdomain systemd[1]: hwc-resume.service: Deactivated successfully.
Jan 18 20:35:07 localhost.localdomain systemd[1]: Starting Set clock after resume...
Jan 18 20:34:02 localhost.localdomain systemd[1]: Finished Set clock after resume.
Jan 18 20:34:02 localhost.localdomain systemd[1]: hwc-resume.service: Deactivated successfully.
Jan 18 20:34:01 localhost.localdomain systemd[1]: Starting Set clock after resume...

[core@localhost ~]$ exit
logout
Connection to localhost closed.
$ podman machine ssh 'uname -a ; date -u && uptime --pretty && exit' 2>/dev/null; echo ; date -u && podman --version && uname -sri
Linux localhost.localdomain 5.15.10-200.fc35.x86_64 #1 SMP Fri Dec 17 14:46:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Sat Jan 22 03:10:21 UTC 2022
up 1 day, 14 hours, 55 minutes

Mon Jan 24 09:21:22 UTC 2022
podman version 3.4.4
Darwin 21.2.0 MacBookPro13,3
dustymabe commented 2 years ago

right. IIUC basically the Mac is doing suspend/resume but the VM (FCOS) isn't, which makes sense. Looks like using a systemd timer that runs periodically would be best in that case.

konstruktoid commented 2 years ago

New test with timer.

/etc/systemd/system/hwc-resume.service:

[Unit]
Description=Set clock with hwclock

[Service]
Type=oneshot
ExecStart=/usr/sbin/hwclock --hctosys

/etc/systemd/system/hwc-resume.timer:

[Unit]
Description=Run hwclock on boot and hourly

[Timer]
OnBootSec=15min
OnCalendar=hourly
AccuracySec=5min
Persistent=true

[Install]
WantedBy=timers.target
$ systemctl list-timers hwc*
NEXT                        LEFT      LAST                        PASSED       UNIT             ACTIVATES         
Mon 2022-01-24 14:00:00 UTC 6min left Mon 2022-01-24 13:49:28 UTC 4min 30s ago hwc-resume.timer hwc-resume.service

1 timers listed.
konstruktoid commented 2 years ago

Hmmm. The time read from Hardware Clock when using hwclock is incorrect? Shouldn't Time read from Hardware Clock be "laptop time"?

"Time read from Hardware Clock: 2022/01/24 14:20:55" vs "Mon Jan 24 14:51:01 UTC 20222", 30 min diff.

hwclock from util-linux 2.37.2
System Time: 1643034061.948327
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Last drift adjustment done at 0 seconds after 1969
Last calibration done at 0 seconds after 1969
Hardware clock is on unknown time
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2022/01/24 14:20:55
Hw clock time : 2022/01/24 14:20:55 = 1643034055 seconds since 1969
Time since last adjustment is 1643034055 seconds
Calculated Hardware Clock drift is 0.000000 seconds
Calling settimeofday(NULL, 0) to lock the warp_clock function.
Calling settimeofday(1643034055.000000, NULL) to set the System time.

Mon Jan 24 14:51:01 UTC 2022
roffelsaurus commented 2 years ago

This is still an issue with podman 3.4.4 on macOs.

When doing podman build of a Dockerfile with RUN apt-get update -q it crashes with exit status 100.

E: Release file for http://security.debian.org/debian-security/dists/buster/updates/InRelease is not valid yet (invalid for another 9h 11min 44s). Updates for this repository will not be applied.

I suspect this impacts many seeing this is a very common use case, so I added the error message here for discoverability.

Workaround:

The command hwclock --hctosyswas still off by 20 minutes somehow, but sudo reboot fixed the clock. It now outputs correct timedatectl and that also fixed the above build error.

arixmkii commented 2 years ago

Unfortunately can confirm that hwclock could also go off in qemu on MacOS. And sometimes hours behind. This is the current output of date, hwclock in the VM and date on the host (M1 MBP host if this matters)

Last login: Mon Jan 24 21:13:13 2022 from 192.168.127.1
[core@localhost ~]$ date
Mon Jan 24 21:13:51 UTC 2022
[core@localhost ~]$ sudo -i
[root@localhost ~]# hwclock
2022-01-24 21:58:07.160065+00:00
[root@localhost ~]# exit
logout
[core@localhost ~]$ exit
logout
Connection to localhost closed.
% date
Tue Jan 25 12:52:40 EET 2022

What else could be explored is using qemu agent to adjust time somehow from the host, but this is controversial due to security implications, the thing is just way too powerful for time correction only.

arixmkii commented 2 years ago

Guest agent for time sync related discussions in lima-vm project https://github.com/lima-vm/lima/issues/355 https://github.com/lima-vm/lima/pull/490

konstruktoid commented 2 years ago

With a timer I guess it's less bad...

$ podman machine ssh 'uname -a ; date -u && uptime --pretty && exit' 2>/dev/null; echo ; date -u && podman --version && uname -sri
Linux localhost.localdomain 5.15.17-200.fc35.x86_64 #1 SMP Thu Jan 27 16:29:05 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Mon Jan 31 16:51:10 UTC 2022
up 2 days, 16 hours, 42 minutes

Mon Jan 31 17:47:01 UTC 2022
podman version 3.4.4
Darwin 21.2.0 MacBookPro13,3
lvh commented 2 years ago

hwclock was not sufficient for me on my M1 MBP: I would still get some skew (not as long as the machine had been off but enough to cause problems with e.g. certificate and signature validation). Instead. I tell chrony to do its thing:

podman machine ssh "sudo chronyc -m 'burst 4/4' makestep; date -u"

Note that this will step the clock instantly instead of slewing. Hence it is appropriate for running immediately on resume but it may impact existing containers negatively. See chrony's makestep and related documentation for details.

konstruktoid commented 2 years ago

@lvh have you succeed in making it run periodically in the background?