Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
326 stars 205 forks source link

Implement basic "flow escalators" #3517

Open warner opened 3 years ago

warner commented 3 years ago

What is the Problem Being Solved?

In #3465 we outlined a plan for a basic pair of swingset run-queues ("high priority" and "low priority"), as a starting point for more sophisticated scheduling. Today @michaelfig and I sketched out a slightly more interesting scheme that could give users some meaningful control. The basic user story is:

Alice wants to make an AMM trade, but the chain is very busy, and the run-queue is never empty. She submits her transaction, but it gets stuck at the back of the line for a long time. She gets impatient, and is willing to pay more for expedited service. She needs a way to 1: learn about the state of her request, 2: influence it.

The main problems with #3465 are:

Description of the Design

The basic idea is the "Flow", following @dtribble's original formulation. Within a swingset machine, all messages sent on a single Flow are FIFO-ordered, so any related messages that depend upon ordering must be sent on the same Flow. As a starting point, each client machine will have a single Flow. On the chain, a new Flow will be allocated within the kernel during the provisioning process. The kernel uses a flowid to name the Flow. Each Flow is a FIFO queue of messages, possibly (usually) empty. Each Flow has a number of "boost points" (usually zero). Each point buys you one high-priority message delivery. The kernel exposes APIs (to the host) to create/allocate a Flow, and to add boost points to a flow.

The old run-queue of messages is replaced by two queues (high and low priority) of Flows. The kernel services the high-priority queue before the low-priority queue. The kernel services all messages from the first flow on a queue before the next flow on a queue (subject to the "boost points" limit, below).

When a message is added to an empty zero-boost-point flow, it is moved onto the back of the low-priority queue. If the flow has any boost points, the flow is instead moved to the back of the high-priority queue. If boost points are added to a flow on the low-priority queue, it is moved to the back of the high-priority queue. When a flow is empty, it is removed from the queues.

When executing messages from the high-priority queue, one boost point is deducted for each delivery. If the boost point count drops to zero, the flow is moved to the back of the low-priority queue, even if it still has messages waiting to be delivered. New messages may be added to the flow during that delivery, and as long as there are still boost points left, they will be executed before anything from the next flow on the high-priority queue.

When executing messages from the low-priority queue, the kernel will keep processing messages from the first flow until 1: the flow becomes empty, or 2: a message is added to a flow with boost points, or 3: boost points are added to a flow with messages. The kernel will then service the high-priority queue until it is empty, before moving back to the low-priority queue.

Each delivery comes from a flow, which establishes a default flowid, which will be inherited by events created during that crank. This ensures that the priority of the request is shared with the response. A future mechanism will enable vats to select an alternate flowid for some deliveries (the illustrative use case is a high-priority AMM trade, whose response gets the benefit of the client's prioritization, but the price-change notification that results should not), but for now everything gets the inherited value.

The Mailbox device will be the one place where the flowid can be set. The host currently calls this with (effectively) a list of (remote name, sequence number, message body) items, plus a single ack number, and the Mailbox device uses syscall.sendOnly() to enqueue each message to vat-vattp, from which they travel to vat-comms, and then on to Zoe and contract vats.

We augment these items to include a flowid. The Mailbox device will use a new argument to syscall.sendOnly to specify the flowid. For now, this will be the only control over flowids. In the future, we'll want to track flows through a c-list, but for now they'll be widely held. Any low-level vat code able to invoke syscall.sendOnly could use any existing flowid they like. Note that liveslots does not currently expose sendOnly to userspace, so for now only devices can use it.

All immediate consequences of the vattp-bound message will use the same flow. If the trade or other transaction can be completed without leaving the kernel (e.g. IBC) or waiting for a timer event (e.g. it could all happen within the same block, if blocks had infinite capacity), then all these consequences will happen before any other flows are serviced. This is probably better than:

The kernel will also expose a getSchedulerState API to the host. This will return a serializable data structure that details the ordered list of { flowid, boostPoints, numMessages } for each of the two queues. The full external kernel API is:

This ticket is mostly about the kernel facilities, but @michaelfig will have a separate ticket for the cosmic-swingset code to match. This code will:

Security Considerations

Test Plan

kernel-side unit tests

warner commented 3 years ago

cc @rowgraus to think about the UX of this approach

dckc commented 3 years ago

... inherited by events created during that crank ...

"events" means send and resolve syscalls? I don't remember "events" in kernel-speak before.

warner commented 3 years ago

I've been waffling on the terminology, but the issue is that there isn't a 1:1 relationship between syscalls and run-queue entries:

It's those two kinds of events that need to inherit a flow, from those three kinds of syscalls.

warner commented 3 years ago

@dtribble was -1 on using "flow" to describe these (I think it doesn't sufficiently match his original definition), and also thought "streams" should be held in reserve for something else (although I need to understand what he has in mind), and suggested "activities". That feels a bit off to me, so I'm thinking of using "activity stream" for this, at least for now.

michaelfig commented 3 years ago

@warner: @JimLarson and I were discussing the upcalls needed for the lien mechanism, and we noticed that there would need to be a special "immediate-priority" queue so that synchronous Golang calls would resolve. This could happen during other JS downcalls (such as a transfer caused by vat-bank.js). Such upcalls need to be scheduled and resolve their promises within the same chain context, without expecting the chain to make any further progress.

Otherwise, our liens would deadlock in trying to get a result from JS if users or other calls are pushing themselves onto the same queue and the upcall scheduling is deferred to another block.

warner commented 3 years ago

Hm, it sounds like that queue needs to bypass everything, even the #3582 run-policy that could end the block early. We should talk more, I'm not sure that a queue of any sort is the right mechanism for this. And if vat-bank is waiting for a syscall.callNow() (i.e. device invocation) to return, we should not be allowing anything else to run within the kernel, and certainly not allowing other vats to get time.

What's the nature of the upcall? What swingset/vat-side activity does it need to trigger, and what sort of return data is it expecting?

michaelfig commented 3 years ago

What's the nature of the upcall?

An attempt by cosmos to transfer tokens must first check with the attestation contract how much is locked up in a lien.

This is a bridge device message that must wait on the resolution of a promise to a value before returning that value to cosmos. In short, that delivery and all of its consequences must run immediately. The vat will probably signal its completion by sending back a call over the bridge.

What swingset/vat-side activity does it need to trigger

Messages via he bridge vat, a middleware vat, and the attestation contract vat.

, and what sort of return data is it expecting?

Purely jsonable data is the resolution.