Open mhofman opened 1 year ago
First, let's refine our terminology a bit. An "infinite timer loop" is when a vat schedules a wakeup, and then in response to the wakeup message, schedules another one. A "fast infinite timer loop" is when that series of wakeup messages starve other work, e.g. if the kernel is hosted in a block-based application, and the second/etc message occurs in the same block as the first. Eventually the runPolicy
computron limit kicks in, ending the block and allowing the host to do it's blockstuff, but then the next block will resume the loop, so all blocks will be filled with the loop messages. The only way out is if/when the vats involved reach some sort of accumulating meter limit and get killed (or, someday, paused).
A "slow infinite timer loop" is where each wakeup message is deferred to a later block, via the behavior changes @mhofman describes above. This gives us more options to interleave (i.e. not starve) other work. The loop might still be infinite, and the vats involved might still get terminated some day if/when their meters run out, but at least the loop is running in parallel with other work, so the overall system remains usable.
The behavior change would convert fast loops into slow loops, which are more survivable.
Note that any repeater is nominally an infinite loop, but repeating once an hour is more comfortable than repeating once per second. Given the quantized time resolution of a blockchain-embedded kernel, repeating more often than about 5-10 seconds is effectively the same as repeating instantaneously.
When @mhofman first proposed this, my original reaction was that it'd be arbitrarily delaying wakeup events "for no good reason", i.e. for a very good mitigate-fast-infinite-loops reason, but by having the kernel (device-timer/vat-timer) impose a policy that makes deep assumptions about the host application in which the kernel is embedded. The timer documentation would have a big footnote trying to justify the change without dictating something about the host's decisions. And I'd worry about how to make this work with a more real-time (solo-node) host, where time is not quantized by blocks, and the host's timer endowments could use alarm(2)
or select
/poll
to wake up at the requested time, rather than be woken up periodically.
But I'm coming around to the idea. Starting from first principles, the timer service has two fundamental requirements.
The first is that wakeups must not happen too early. To be precise, when a vat asks for a wakeup at T=10s, the wakeup handler (either .wake()
message or promise resolution) must not be able to observe a current time of < 10
. Vats do not get ambient access to a clock, so the only way for a vat to observe time at all is to either send now = await E(timerService).getCurrentTimestamp()
(and wait for a response), or to do now = D(timerDevice).getLastPolled()
(synchronous), assuming the vat has access to such objects in the first place. Or for them to receive a message from someone else which contains a timestamp. The causality light-cone descending from the wakeup message must not be able to observe any earlier timestamp than the requested wakeup time.
(Note that D(timerDevice).getCurrentTimestamp()
would be a perfectly reasonable API for us to add: the "polled" in getLastPolled
is an artifact of our current implementation, and should not be interpreted as limiting the API to sampled values. A properly real-time timer device would call gettimeofday()
in every call to getCurrentTimestamp()
.)
The second is that wakeups should not be arbitrarily delayed. They might suffer from quantization delay, and from delivery delays, but should not be dependent upon e.g. other IO that just happens to trigger a handler.
I've seen event-loop reactor systems (it might have been Gnome/GTK) with the bug I'm concerned about. The reactor offers "I want to read from this FD, run my callback when it becomes readable" APIs, and some kind of timer/wakeup API. Internally, it's got a select()
loop, with file-descriptors being monitored, and a timeout
argument. But, for whatever reason, the system doesn't try very hard to compute the next wakeup time (maybe it was too hard, maybe it was unreliable in the face of timer requests being added or removed). Instead, it use a constant 100ms timeout
, and just polls the timer queue every time select()
wakes up. That gets you timer wakeups within 100ms of the desired time, but 1: it might happen faster if an FD becomes readable, and 2: the system is no longer quiescent, we're constantly waking up at 10Hz even if there are no wakeup requests that need it. In these systems, the 100ms wakeup seems to have been added as a safety mechanism, an afterthought, in response to a bug report that e.g. a calendar entry wakeup on an idle system (no extra IO triggering the poll) did not happen on time. Why not use 10s? Why not use 1ms? The tradeoff between extra power drain / CPU load and reducing wakeup delay is not something that should have to be decided at this level.
So that's the "arbitrary" delay (and non-quiescence) that I want to avoid.
The "quantization delay" that I am willing to accept is a consequence of the quantized blockTime
available in a chain-based system, or the finite resolution of the gettimeofday()
on a more real-time based host, or the 100Hz CLK_TCK
speed on Linux. A vat might request a wakeup at T=10.0009 seconds, and a host with a 1ms-resolution clock cannot even attempt to wake them up at that moment: the closest it can get is either 10.000 (which might violate the "too early" rule), or 10.001. So it's ok for the timer service to pick 10.001 . Vats don't get to know the quantization used by the underlying device (and it's not something they could rely upon anyway, given delivery delays), but in general they must be tolerant of these sorts of delays.
The "delivery delay" is because computers are not infinitely fast. There is a non-trivial amount of code between the invocation of D(timerDevice).getCurrentTimestamp()
and the result being provided, and that code takes time. The gap between E(timerService).getCurrentTimestamp()
and the result promise being resolved is even larger. Every Timestamp
a vat receives is already stale by some unavoidable amount.
In a heavily quantized block-based host, that amount might be zero (e.g. two calls to getCurrentTimestamp()
inside the same block can return the same value). But vats cannot rely upon that: the runPolicy
computron limit might kick in between the two calls, so the second check runs in a later block than the first. Vats must tolerate arbitrary delivery delays in any gap between one timestamp (e.g. the requested wakeup time) and a subsequent one (e.g. being woken up).
But, we should not impose additional delays beyond quantization and delivery. Delaying a wakeup until some unrelated IO took place would be wrong.
So, given that framework, let me evaluate the proposal. We would basically change E(timerService).wakeup(when)
to never fire within the same block. The fastest/simplest way to implement a timer service in a less-constrained system would be to call alarm()
with the requested wakeup time, or add it to a sorted queue and include the first wakeup time in our select()
/poll()
call.
There is an inherent race here: if a wakeup request for T=10.1 is received at T=10.0, and it takes 90ms to submit the alarm()
call, then nothing will happen immediately. If it took 110ms to submit the call, the wakeup would happen instantly. In between the two, there is a knife edge, applied randomly depending upon process scheduling, CPU speed, etc. So there's an inherent uncertainty.
We can live within that uncertainty: the caller cannot know for sure which edge they fall on, so we can claim that the "arbitrary delay" we're imposing (to push wakeups into the next block) is a consequence of that race condition. Obviously this becomes less plausible if someone asks for a wakeup time in the distant past ("oh, sorry, I know you asked to be woken up a week ago, but it took us a week plus one second to insert your request into the queue, it just missed the cutoff for running 'now', maybe if you'd requested an even earlier wakeup.."). But basically any requested wakeup time from now+5s must obviously go into a future block, a wakeup at now+0.00001s is probably in a future block, a wakeup at now-0.00001s could conceivably wind up in a future block if it took us non-zero time to process, and we can extend that to pretend that every <= now
timestamp is permitted to be serviced in a future block.
Note: looking carefully at the POSIX alarm(2)
and select(2)
and poll(2)
man pages, I notice they exclusively accept relative values for their timeout arguments, not absolute. So they've cleverly excluded the race condition, in fact they've excluded any way to even express it. And at this level, in the C
mines, all calls are blocking, and not using callbacks. So there's no question of whether a selectButWithCallbacks()
will invoke your callback within the select
or later depending on the knife-edge of future-vs-not-future. You can't reliably distinguish the invocation delay of a select()
that returns immediately from the slightly larger invocation delay of a select()
that waits a moment longer.
So, in some sense, this whole question is a consequence of our decision to include absolute-time APIs in the first place (setWakeup
/wakeAt
), rather than only providing relative APIs (delay
and most of the repeater forms). If we'd only given relative APIs, the callers could not distinguish our "never in the same block" delay from the delivery delays need to get their request to vat-timer.
So, in conclusion, I'm ok with having the timer vat/device impose this extra delay.. I don't think it falls into the "no arbitrary delays" that I was worried about.
It would prevent the timer service from being used for a valid use case ("run this thing as soon as possible but in a later crank", my half-joking target~~.method(args)
syntax, meaning "eventual eventual send"), but that's fine. Whatever the syntax, liveslots could do syscall.send(myOwnVref, methargs)
to push something onto the run-queue: it doesn't need to involve the timer service.
An even longer target~~~.method(args)
(meaning "as soon as possible but in a later block") should use some entirely different service: the timer might decide to defer things to a later block to achieve our "avoid fast infinite timer loops" goal, but that doesn't make the timer service a sanctioned tool to reason about blocks. If we wanted to provide that facility (which I doubt), we should have some device or service which explicitly uses blockHeight
. Maybe just another instance of the timer service but parameterized by blockHeight
instead of blockTime
. Such an instance could only possibly exist on a chain, making it obvious that a solo/realtime node should not provide it.
So now, poll
vs wake
. The original intention of the timer device was to run on two different kinds of host systems. Our primary target is of course the chain, where blocks just happen, over and over, reliably, and independently of what the kernel wants. That means we don't need to request a wakeup at a specific time: we'll always be woken up on every block, whether some vat needs to know or not. The system is never quiescent (at least under our configuration: "no empty blocks" would change that). The simplest host API there is for the host to inform the device about the current blockTime
inside every START_BLOCK
or END_BLOCK
(we originally used START, but now do it in END). That's where we got timerDevice.poll(blockTime)
.
The other kind of host is a "real-time" solo node where you don't have these constant blocks. There, the host would have a select()
loop that waits for either IO or a timer wakeup, then invokes controller.run()
, commits the DB, dis-embargoes output messages, then returns to the loop. It might attempt to batch inputs (deferring the controller.run()
for the sake of improving efficiency), but then it takes on the responsibility to avoid arbitrary IO-sensitive delays. If the runPolicy causes run()
to finish with work still on the queue, the host should run again promptly, instead of waiting for more IO. But in any case, the timer device in this mode can do two new things: it can usefully request timestamps at any moment (every call to E(timerService).getCurrentTimestamp()
can get a different value, even calls that happen in the same controller.run()
), and it can request a wakeup for a specific time (and remain quiescent until then).
So in a real-time host, the timer device API should have 1: a way for the device to gettimeofday()
from the host, 2: a way for the device to request a wakeup at future time X, and 3: a way for the host to inform the device that time X has arrived. The host should include time X into its select()
timeout computation. To minimize the API surface, the timer device only gets one wakeup call (it must maintain an ordered queue internally). So the API for "2" reduces to "please wake me up at time X", "please change my wakeup time from whatever it was before to time X", and "please cancel my wakeup time", all of which can be expressed with host.setWakeup(timestamp or undefined)
.
We didn't get around to building timerDevice support for the second kind of host, because the chain was our top priority. But I want to retain the struture to support it in the future. I think that means device-timer.js
has two modes. In one (at least before the changes proposed by this ticket), the host just calls poll(blockTime)
on a regular basis. That needs to fire any wakeups. In the other mode, the device needs to be handed a getCurrentTime
endowment, and a setWakeup
endowment, and we can continue to use poll()
as the API for "3", called by the host with the current time, but only in response to a setWakeup
. We could do that fairly cleanly by providing getCurrentTime = setWakeup = undefined
for the first mode, and providing real functions for the second. The device would look something like:
let latestTimestamp;
hostInputs.poll = now => {
latestTimestamp = now;
fireWakeups(now);
};
vatAPI.getCurrentTimestamp = () =>
if (hostOutputs.getCurrentTime) {
latestTimestamp = hostOutputs.getCurrentTime();
}
return latestTimestamp;
};
vatAPI.setWakeup = when => {
insertIntoQueue(when);
if (hostOutputs.setWakeup) {
const next = firstInQueue();
hostOutputs.setWakeup(next);
}
};
vatAPI.cancelWakeup = cancelToken => {
removeFromQueue(when);
if (hostOutputs.setWakeup) {
const next = firstInQueue(); // might be 'undefined' for "no wakeup needed"
hostOutputs.setWakeup(next); // thus might cancel the one wakeup time
}
};
@mhofman 's proposal is to always call poll(now)
, to set the current timestamp, but make the wakeup invocations explicit with a separate wake()
call. I assume that wake()
would need to only submit one wakeup message to the run-queue, even if there are more pending, to maintain the separation of runs that the description calls for. And I assume that wake()
needs to return a boolean that indicates whether a message was submitted or not, so the caller knows when to exit their loop and proceed to the next set of actions (the normal-priority action queue).
I see two issues. The first is with the need for wake()
to submit only one event. For historical reasons, device-timer.js
implements a full wakeup queue (badly), and vat-timer.js
also implements a full wakeup queue (better), and vat-timer limits itself to using one wakeup event at a time from device-timer. That avoids the problems with device-timer's implementation, and conceals the device node from all other vats (preventing them from triggering the same problems).
But the consequence is that device-timer doesn't know how many vat-timer wakeups will be triggered when it does it's SO(handler).wake(time)
. If someone calls E(timerService).wakeAt(tuesday)
four times, there will be four different promises getting resolved for a single poll(tuesday)
call. That wouldn't satisfy the apparent goals for the "wake()
one at a time" part of this issue's proposal.
(I think we could punt on that, and say that wake()
fires the one device wakeup, which fires an unknown number of vat wakeups, because in practice I don't think it's common for several wakeups to be scheduled for the same timestamp).
The second issue is how to square wake()
with the real-time host mode described above. I guess it's mostly the same: call poll(now)
in response to a "please wake me at X" time (instead of periodically), but also always call wake()
until it stops reporting that work has been done. poll()
is not the best name for the only-update-currentTime
-but-does-no-work API, we might change the name, but the functionality can be the same in both modes.
One other wrinkle: we had a consensus bug in this neighborhood a while back, which we need to avoid regressing upon. I think it involved the mailbox device, rather than timer, but the prinicple is the same. The device had an "add messages from the outside" API for the host to call, and the API returned a boolean to indicate whether the batch of messages included any new ones. If true
, the host was obligated to call controller.run()
and let the new messages get processed. If false
, the host could skip the run()
because it knew there was nothing new to do.
However, device state is not recorded as orthogonally as it would be for a vat. I think the "what's the highest seqnum" field was not restored after a kernel restart, so for the recently-restarted kernel, it would always return true
. The extra controller.run()
on the restarted kernel then caused a divergence. We fixed it by ignoring the return value, and always doing the controller.run()
.
We need to make sure that the timer device can accurately signal the need for a wake()
. The advantage of having poll()
always process all queued events is that it doesn't give the host application any opportunity for divergent behavior. I think this is doable but I'd want to carefully study what went wrong with the mailbox device, to make sure we don't repeat that mistake.
Thinking more about this: the proposal has three parts:
getCurrentTimestamp()
)" from "run callbacks"I'm on board with this, using the justification from above. To implement this, we need to identify all places that short-circuit wakeupTime <= now
and remove the short-circuit. The full API surface is defined by the timerService
singleton:
The absolute APIs (setWakeup
and wakeAt
) can short circuit when the requested wakeup time is earlier or equal to now
. The setWakeup
clause that should be removed is:
and the matching clause for wakeAt
is inside wakeAtInternal
:
The relative-time APIs (delay
, and the various repeater APIs) can be provoked into an immediate callback by providing a relative delay of 0
. (Changing this to pretend that the caller provided some small-but-non-zero epsilon
is even more justified by the arguments in the earlier comment). delay
routes through addDelay
to the same wakeAtInternal
above. The repeater APIs all pass through the repeaterEventBehavior
, so we need to remove this clause:
Note that all these guards were helping to fulfill the expectation listed in the reschedule
comment:
However the reschedule
comment is merely a warning that if the first scheduled event is in the past, we won't fire right away, and will instead depend upon whatever D(timerDevice).setWakeup)
does. And that's the new behavior that we want. So I'll update the comment to remove the "should" and merely remind the reader of the consequences when newFirstWakeup
is in the past.
We also need to look at the timer device, but I think it's already doing the right thing:
Finally, I need to study measureInterval
more carefully, specifically in the now === start
case.
I'm -0 on this one, because it introduces an observable transition-ordering weirdness in the TimerService. SwingSet currently provides FIFO ordering of messages between any two vats, and this would effectively violate FIFO ordering.
A client vat can request a wakeup for e.g. time T=10 and then expect a wakeup message (either a literal E(handler).wake()
callback, or a promise resolution). Call that message wake10
.
The client vat can also send E(timerService).getCurrentTimestamp()
and get back a response. Suppose the timestamp it samples is T=20, so let's call these two messages getTime
and timeIs20
. Note that both wake10
and timeIs20
travel the same path, from vat-timer to the client vat, so our usual FIFO ordering rules are relevant.
With the current behavior, where poll()
both updates now
and triggers the callbacks, the vat can expect that wake10
arrives strictly before timeIs20
. More precisely, the vat can expect that every wakeup message requested for time T will arrive before any getCurrentTimestamp
response that announces a time > T. We've got wiggle room to choose the relative ordering when time = T (i.e. if the getCurrentTimestamp()
itself, in the act of querying the device, discovered that some pending wakeups could be fired, and then it gets to decide whether to send/resolve the wakeups first, or respond to the getCurrentTimestamp
first).
The current behavior, where cosmic-swingset tries to finish old work before touching the timer at all, effectively says that time does not pass at all until the run-queue is drained. There's no way to observe changes in getCurrentTimestamp()
, or synchronous device queries (if those were made available to clients, which they aren't), or the sequence of wakeup messages, unless "time passes" (by getting past the "finish old work" phase), at which point all sources of time update the same way. This is a pretty straightforward model for client authors to reason about.
With the proposed behavior, where poll()
updates now
but we don't trigger callbacks until later (perhaps many blocks later, if high-priority items take precedence and we never reach the timer servicing line before we exhaust the runPolicy
computron limit), then any action (with sufficient priority to run now) could query the TimerService as it runs, and observe timeIs20
, even though it had scheduled a wakeup for T=10 and had not yet seen the message. There's no FIFO ordering that could produce this sequence of events, and we have to explain that some ways of asking about time will show it passing, but others will not, which doesn't sound straightforward at all.
Some day we'll have a non-FIFO scheduler which might admit this sequencing. But the non-FIFO-ness of that scheduler will probably come from priority ordering, which will solve the "we want to service the high-priority action queue before we service any timer messages" problem more directly. In that scheduler, I expect that we'll trigger all timer events promptly, and rely upon the scheduler to delay the delivery of the resulting messages, in favor of the higher-priority traffic that comes from the actions. So we'll accomplish the same goal without allowing some actions to sometimes "see the future", or introducing an inconsistent model of time.
@mhofman notes the following positive benefit of making this change:
The new block time will be available when processing high priority actions
I'm trying to think about how high-priority actions would benefit from this, and whether it justifies breaking FIFO order. I expect that high-priority actions to be injected manually, in response to some emergency, so the author of the action will know "when" it should be executing, and their code will have less need to know some newer/better timestamp. However, maybe the high-priority action wants to perform a liquidation or do some economic thing which changes depending upon the current time (reducing an auction price?), and so there would be a difference between "inject a high-priority event into the past", and "inject a high-priority event that gets to see the future".
My hunch is that it's not worth it, but if someone shows me a use case, I'll reconsider.
I'm -1 on this one. First off, we can't even implement it today, because of how device-timer and vat-timer interact. The vat only submits a single wakeup request to the device, and maintains its own internal queue of what wakeups will be performed when it fires. The device doesn't know about that list. So to build a wakeOnlyOne()
API on the device, we'd need to:
Secondly, when the scheduler grows to provide non-FIFO priority ordering, the need for this will go away: we can trigger all callbacks at the same time, their messages will all get dropped into the run-queue, and the scheduler will sort things out. So it doesn't feel appropriate to perform short-term architecture contortions, when the long-term plans include a better fix.
So my plan is to implement the first part, removing the if (now >= when)
short-circuiting immediate callbacks, along with a whole bunch of test updates. But I'm not inclined to do the second (splitting poll()
from wake()
), and I don't even see how to do the third (wake one thing at a time).
I'm diving into the implementation, and found that repeaters need special handling. If you create a repeater with delay = 0
(meaning the first wakeup is "now", the second is "now + 1interval", etc), I'm going to need to make the first wakeup be "now + 1interval".
The function that computes the next wakeup time (measureInterval
) is used for every wakeup: both to compute the initial wakeup time, and then again at each wakeup time to compute the subsequent wakeup time. For start=0
and interval=10
, you can call this function with various values of now
, and it tells you when the next wakeup needs to be. If now
is 10 or 20 or 30, it needs to tell you 20 or 30 or 40: the next interval, so that you sleep for interval
and not for 0
.
If someone uses start=0
, and we use the initial policy (described above) of responding by telling the timer device to wake us up now
(which really means wake us up soon), then the host is free to perform that wakeup with the same timestamp that we requested: time does not necessarily pass between the creation of the repeater and the first wakeup, even though blocks/runs do pass. Which means that measureInterval()
may be called with the same now
value twice: once when the repeater is first scheduled (telling us the first wakeup time, e.g. now
), then again during the first wakeup (where we need it to tell us the second wakeup: now + 1*interval
).
I don't want to add an option to measureInterval
to say whether this is the initial scheduling or a wakeup schedule-the-repeat call, and I don't think it's important to provide a makeRepeater
which can start "soon" instead of at the first interval. You can get one-shot "wake me up soon" with delay(0n)
, but I don't think "soon and also repeat" is so important.
So I'm going to change the policy described above: when makeRepeater
(and the other repeater APIs) are called with delay=0
, that means "wake me up at now + 1*interval
", instead of "wake me up soon". This is a visible behavior change, so if someone disagrees, please let me know.
although.. grepping through the source code, I see one real use of repeatAfter
, in inter-protocol:
and Zoe's priceAggregator
contract:
and both of them (might be) called with delay=0, so maybe this "soon and also repeat" is more important than I thought.
@Chris-Hibbert any thoughts?
I'm trying to think about how high-priority actions would benefit from this, and whether it justifies breaking FIFO order. I expect that high-priority actions to be injected manually, in response to some emergency, so the author of the action will know "when" it should be executing, and their code will have less need to know some newer/better timestamp. However, maybe the high-priority action wants to perform a liquidation or do some economic thing which changes depending upon the current time (reducing an auction price?), and so there would be a difference between "inject a high-priority event into the past", and "inject a high-priority event that gets to see the future".
It doesn't seem to me that high-priority actions should have a need to be time-dependent. They should all be of the form "get this done stat!". Using them to schedule new repeating events, or expecting them to act differently depending on how quickly they broke through the log-jam isn't useful.
If you create a repeater with delay = 0 (meaning the first wakeup is "now", the second is "now + 1interval", etc), I'm going to need to make the first wakeup be "now + 1interval".
As I wrote in slack, the auction scheduler has a repeatAfter with a computed delay. When the delay bottoms out to zero, the action should be executed soon, and at repeated delays of interval. If the scheduler is going to skip over delays of zero, the code would have to check for the zero case and invoke the action explicitly.
It's fine if "soon" means "potentially after a delay". The code should be checking when time-sensitive actions actually get invoked, and sometimes saying "it's too late. I missed my window of opportunity, and should just abort the current round". This is never tight enough that we need sub-block timing. Presumably when we have time-quakes we won't continue processing until the next block and use a later timestamp.
Ah, I see you found the repeatAfter()
call. The distinction between a computed delayFromNow
of 0 and 1 should be small.
To minimize the API surface, the timer device only gets one wakeup call
I would like to challenge this as a requirement.
in practice I don't think it's common for several wakeups to be scheduled for the same timestamp
I would like to challenge that assumption as well
travel the same path, from vat-timer to the client vat, so our usual FIFO ordering rules are relevant.
With the current behavior, where
poll()
both updatesnow
and triggers the callbacks, the vat can expect thatwake10
arrives strictly beforetimeIs20
.There's no FIFO ordering that could produce this sequence of events, and we have to explain that some ways of asking about time will show it passing, but others will not, which doesn't sound straightforward at all.
Some day we'll have a non-FIFO scheduler which might admit this sequencing. But the non-FIFO-ness of that scheduler will probably come from priority ordering
This is not entirely accurate. Our only ordering guarantee is that 2 message sends from one vat to the same object in another vat are delivered in the same order. Not all 2 deliveries between 2 vats have strict ordering guarantees, especially if promises are involved. Also, nothing strictly requires the timer vat to resolve a wake promise before it updates or answers with the current time.
Furthermore there is a plethora of ways a vat could discover new time before its wake callbacks are invoked, for example if a 3rd vat gets woken up first and sends a message to the vat with the new time, which gets delivered before the direct wake message from the timer vat.
That said, the timer vat could try to attempt maintaining that illusion by having a different timer facet for each vat, which would be beneficial for other reasons. But that illusion will only ever be partial, and a vat may discover time has advanced on its own.
Secondly, when the scheduler grows to provide non-FIFO priority ordering, the need for this will go away: we can trigger all callbacks at the same time, their messages will all get dropped into the run-queue, and the scheduler will sort things out. So it doesn't feel appropriate to perform short-term architecture contortions, when the long-term plans include a better fix.
That is not entirely accurate either. The non-FIFO scheduling actually necessitate associating each timer wake to the context that registered wakeup in the first place, which cannot be done by the timer vat until we have a mechanism to expose these contexts to vats, and vats to select a context when sending a message. I expect such a mechanism to come much later than a non-FIFO scheduler. The solution for that is to escape to the host for each timer registration and wake so that the host can set the correct context when triggering the wakeup.
Regardless in general it's good to provide visibility to the scheduler of how much work may be pending for a given context. At the end of the day, timers are just another kind of queue.
in practice I don't think it's common for several wakeups to be scheduled for the same timestamp
I echo @mhofman's disbelief. Within a vat, I try to keep wakeups unique, but multiple vats might all have things to do when an auction is scheduled to start, and they'd probably separately schedule wake-ups rather than waiting on a common promise.
What is the Problem Being Solved?
Currently the vat-timer which implements the timer service will trigger a timer event itself if the wakeup time is the current time or in the past. Given the current run to completion semantics in cosmic-swingset, this means that whatever action caused the timer to be scheduled will continue its processing through these timer events. This is different from events which are scheduled in the future, which go back through the host (cosmic-swingset) scheduler, and may process external actions before triggering timer events (see https://github.com/Agoric/agoric-sdk/issues/6964). Effectively this enables buggy contracts to result in run away execution that preempts processing of any other action, including high priority activity.
Furthermore, polling the timer will currently queue for execution all the timer events that are due, resulting in their interleaved execution. The current cosmic-swingset run to completion semantics try to avoid any interleaving of independent actions as that usually results in less efficient processing given the simple FIFO run queue of cosmic-swingset.
Description of the Design
poll
fromwake
This results in the following behavioral changes:
This also opens the door to support explicit high priority timer events in the future.
Security Considerations
While this prevents runaway timers loops from preventing processing of high priority activity, it makes it more difficult to identify and throttle these runaway timer loops. One mitigation may be to clamp these events to always execute in the next block (which may not be automatic if that immediate timer event is scheduled during the processing of a timer event).
Scaling Considerations
This should not have any negative impacts on chain execution, and actually enables fairer execution.
For non chain host, the timer device interface should expose sufficient information to let the host schedule a system wake up for the next timer event if that event is scheduled for the future.
Test Plan
TBD. Timer events scheduling should be unobservable from the vat's perspective. However changing the scheduling will result in a different execution order, and given our current very constrained scheduler, some contracts may implicitly rely on some specific ordering.