WebAssembly / wasi-clocks

Clocks API for WASI
40 stars 14 forks source link

The monotonic clock may be too underspecified #47

Open CryZe opened 1 year ago

CryZe commented 1 year ago

The monotonic clock as it stands right now is likely too underspecified in what it is measuring and should possibly be split into two separate clocks. This problem is something that is fairly unknown, but eventually has bitten basically every operating system, browser and programming language and I would want to prevent WASI from running into the same issue before it's too late.

The thing is that a lot of programs want to measure the time they are actively running, whereas a lot of programs want to measure the real world time that is happening. That sounds like it is the same thing, but it actually differs when the host is suspended (phone screen turned off, laptop closed, operating system suspended, wasm runtime suspending the guest module for a while, ...).

A lot of the time you want to measure the actual real world time that has progressed, for example when you have some sort of server API token that expires in 60 minutes, then turning off the screen of my phone for a while shouldn't mess with the time measurement.

However there are also cases where an application wants to only measure "its perceived time". The easiest example would be in a game, where me turning off the screen of my phone and turning it back 20 minutes later shouldn't cause the physics to freak out because a single frame took 20 minutes (usually the velocity vectors are multiplied by the time a frame takes).

Whether this second case is worth supporting in WASI is debatable, but it's important to at least very clearly state the difference in the documentation, such that runtime implementations don't accidentally implement the wrong thing. In fact the WebAssembly runtimes (such as wasmtime) already inconsistently implement this as either "real time" or "perceived time".

References: https://lwn.net/Articles/428176/ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6ed449afdb38f89a7b38ec50e367559e1b8f71f https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/a3ed0e4393d6885b4af7ce84b437dc696490a530 https://github.com/rust-lang/rust/issues/87906 https://github.com/golang/go/issues/24595 https://github.com/w3c/hr-time/issues/115 https://bugs.chromium.org/p/chromium/issues/detail?id=1206450 https://bugzilla.mozilla.org/show_bug.cgi?id=1709767 https://bugs.webkit.org/show_bug.cgi?id=225610

sunfishcode commented 1 year ago

Thanks for posting this! I'm not sure yet what I think. Here are some initial notes.

At first glance, this seems to align with the spirit of WASI: in general, we don't want guest code to be aware of "the system" unless it has a specific need to, so it's tempting to want "the system is suspended" to default to looking the same as the system just running really slow for a while, which could happen for any number of reasons that the guest wouldn't know about.

Some time ago, Linux changed its CLOCK_MONOTONIC to behave like what it now calls CLOCK_BOOTTIME, and later reverted that change due to breakage, and the things that were reported broken were systemd, NetworkManager, and screen savers. These are the kinds of things that fundamentally do need a concept of "the system". WASI may some day grow to support these kinds of programs, however we'd need to add a lot of APIs, and it seems reasonable that if we do this, we could also add a new clock for these programs at the same time.

So overall, this seems like it might be a good idea.

The main complication I see so far is that not all popular host OS's have a monotonic clock that counts time suspended, and even on platforms which do, not all the APIs support it. For example, Linux has its CLOCK_BOOTTIME, however APIs such as futex, poll, and epoll only support the CLOCK_MONOTONIC clock. (For poll we could perhaps implement it using timerfd, but that's only on Linux, and takes extra system calls so it could have extra overhead, though I haven't done any measurements). It could break programs in different ways if our monotonic clock counts differently from our poll or synchronization timeouts.