canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.63k stars 635 forks source link

[qemu] catch possible errors when syncing hw clock #3396

Closed sharder996 closed 6 months ago

sharder996 commented 6 months ago

Fixes #3394

codecov[bot] commented 6 months ago

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (d28626a) 84.16% compared to head (2e027e7) 88.55%. Report is 182 commits behind head on main.

Files Patch % Lines
...rc/platform/backends/qemu/qemu_virtual_machine.cpp 20.00% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3396 +/- ## ========================================== + Coverage 84.16% 88.55% +4.38% ========================================== Files 251 251 Lines 13857 13866 +9 ========================================== + Hits 11663 12279 +616 + Misses 2194 1587 -607 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sharder996 commented 6 months ago

What is the output of:

multipass exec secure-platy -- sudo hwclock --hctosys --directisa --verbose

It looks like this is a hardware issue specific to your host machine which explains why Chris and I never experienced it.

ricab commented 6 months ago

Here's what I get when executing that right after exception is caught:

$ multipass exec secure-platy -- sudo hwclock --hctosys --directisa --verbose
hwclock from util-linux 2.37.2
System Time: 1706889438.995315
Using direct ISA access to the clock
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2024/02/02 15:57:19
Hw clock time : 2024/02/02 15:57:19 = 1706889439 seconds since 1969
Time since last adjustment is 1706889439 seconds
Calculated Hardware Clock drift is 0.000000 seconds
Calling settimeofday(NULL, 0) to lock the warp_clock function.
Calling settimeofday(1706889439.000000, NULL) to set the System time.

The time appears to be in sync after that.

ricab commented 6 months ago

Although sometimes, but not always, the clock gets back in sync on its own after waiting for a while.

ricab commented 6 months ago

When I execute the command that Multipass uses outside an instance, it immediately succeeds.

ricab commented 6 months ago

Well, I am not sure what the difference is in my case. QEMU has options related to the clock, but:

By default the RTC is driven by the host system time. This allows using of the RTC as accurate reference clock inside the guest, specifically if the host time is smoothly following an accurate external reference clock, e.g. via NTP.

I have tried disabling "automatic date & time" in gnome settings but it didn't make a difference.

sharder996 commented 6 months ago

While testing I found another problem that will show up in the future. The util-linux-extra package which contains the hwclock command is not included in base ubuntu from mantic onwards.

This command might be better then:

sudo timedatectl set-local-rtc 0 --adjust-system-clock

Does this work on your instance where hwclock did not? Or any insight into whether this is a better option? From the man page it looks like this command does what we want.

ricab commented 6 months ago

That worked fine for me! And it is part of the systemd suite, so it is pretty standard and probably a better choice.

sharder996 commented 6 months ago

I took a look at what it would take to get this covered in unit tests and I think there would be a bit of initial overhead. We don't have anything set up for mocking SSHSession and we also don't have anything set up for testing backend specific virtual machine classes. If it's not too big of a deal, I think we leave it for another time.

Leaving the try-catch block still seems like a good idea so that we don't crash the daemon if the command fails.

townsend2010 commented 6 months ago

Hey @sharder996,

Although we don't mock SSHSession directly since it's neither an injectable singleton nor a passed in mockable class, it is possible to mock the underlying libssh calls. We do this in many places in the tests, so there are many examples.

That said, we aren't testing any of the signal/slot connections in the QemuVirtualMachine ctor and it appears to be tricky to do so, so I'm ok with deferring any unit tests here.