Solo5 / solo5

A sandboxed execution environment for unikernels
ISC License
883 stars 136 forks source link

Clock drift #549

Closed reynir closed 1 year ago

reynir commented 1 year ago

Many of the backends including xen, hvt and virtio get the wall clock time only at boot and then compute an offset used for computing the wall clock time in solo5_clock_wall. This minimizes the number of hypercalls, but means the guest never synchronizes with the host's wall clock. It is most noticeable on a host such as a laptop where the computer is suspended, but can also happen due to natural clock drift.

https://github.com/Solo5/solo5/blob/a63a755d710a1286a0d0eea1253762ba25e866b7/bindings/hvt/tscclock.c#L110-L120

https://github.com/Solo5/solo5/blob/a63a755d710a1286a0d0eea1253762ba25e866b7/bindings/hvt/time.c#L33-L37

There are a number of strategies imaginable to mitigate clock skew (e.g. recompute the wall clock offset every set interval according to the monotonic clock), but it is perhaps best decided in client code how to approach that.

A way forward could be to expose the hypercall and leave it to the client code (mirage-clock) to keep an offset in order to not make hypercalls as often.

dinosaure commented 1 year ago

After a discussion with @reynir reynir, we agreed on what the Solo5 clock is. This clock is a monotonic clock. That is to say that it is mainly used to have the CPU time.

It is therefore normal, when suspending a microkernel, that this clock is no longer aligned with what we expect from a wall clock.

In other words, if you expect a clock that corresponds to the "real elapsed time", you should not use the clock proposed by Solo5.

However, there are several solutions: 1) The first solution would be for Solo5 to manage a wall-clock (in addition to the monotonic clock) by regularly making a HYPERCALL_WALLTIME hypercall. 2) The second solution is to expose this hypercall and let the user manage his/her own wall-clock 3) The third solution, which is more about the interest of having a POSIX clock, would be to make a regular NTP request and synchronize an internal POSIX clock.

The design of Solo5 is made to be as simple as possible and integrating a new clock goes against this philosophy. Especially since the interest of a wall-clock for unikernels is not systematic - some of my unikernels can work "in the past" without it disturbing their functioning.

However, the interest of having a wall-clock may be necessary in other situations. In this respect, the second solution can be interesting to let the user manage his/he own wall-clock.

Finally, the last solution, much more complex since it requires a network stack, is still interesting but this complexity should relate, in our case, to MirageOS through the "device" mechanisms.

I leave the issue open until I implement the second solution. We can then imagine a new device Mirage_clock.PCLOCK for MirageOS that resynchronizes a wall-clock in the scheduler of a MirageOS unikernel.