per-crank execution fees, meters/keepers

What is the Problem Being Solved?

We expect to charge a fee for execution time (independently of charges for priority of execution). This provides economic backpressure on platform usage, and incentivizes more efficient code.

The basic notion is that each vat is associated with some source of execution credits ("ticks" or "computrons"), the execution of cranks deducts from this source, and platform-level RUN currency is used to replenish the source. The RUN spent to buy computrons eventually goes into a platform stability pool, which is distributed in some economically-interesting way.

Description of the Design

@dtribble and I spent the afternoon brainstorming on this. We've iterated on the topic in the past (#23, for starters, although it talks more about escalators/scheduling than how to charge something once a crank has been selected), so this is one step closer to a coherent design.

The existing codebase gives us:

each time we deliver a message (or promise resolution notification) to a vat, we execute a single "crank"
the crank happens in an xsnap worker process
each process has a fixed per-crank limit of meter usage, established by a command-line argument when the process is first launched (although #2980 is about each crank getting a different limit)
if the crank isn't terminated by a metering fault, the delivery results tell the kernel how much was used

We expect (#2319) to change our block-scheduling algorithm to accumulate the number of computrons used by each crank, and stop executing cranks once the total has reached some threshold. We hope to pick a threshold that gives us a comfortable amount of runtime (good utilization of the available time, low-to-moderate risk of exceeding the block time). The threshold must be part of consensus (so the set of cranks executed is part of consensus), but could change over time if we find a way to steer it correctly.

Given, that, we start from the lowest levels:

Meters, metercaps

We'll introduce a kernel table that maps meterID to a value (in computrons). The meterID is a kref, and clists will be augmented to translate meters in the same way it currently translates objects and promises and devices. The big difference is that all meters are owned by the kernel, so all vats are importing meters, never exporting them. Within a vat, the meterID turns into a Presence-like Meter object, which has no methods or state, just identity.

Some sort of special device will have the ability to manipulate the balance of a meter, given its meterID. This will also enable the creation of a new meterID, or (eventually) the merging/deleting of meters.

We should consider how meters are destroyed. They'll be reference-counted, with references coming from vats that are running on the meter, as well as vats with Meter objects in their c-lists. Once all of these references go away, we should probably conserve the value it held, so maybe each meter should have a parent, and if/when the meter is deleted, the value is reabsorbed by the parent meter. Or maybe it just gets merged into a common stability pool.

Meters + Keepers, decrementing

Initially, we can start with the kernel associating each dynamic vat with a single Meter (stored in a DB key). Each time a crank finishes, the kernel examines the meter-consumed results, and decrements this Meter by the amount used. If the result goes below zero, the vat is killed.

Later, each dynamic vat will have an ordered list of (Meter, Keeper) pairs. Each Keeper is just an object kref. After the crank, the kernel deducts the consumed computron count from the first meter. If that underflows, the remainder is deducted from the next, etc. If the last meter is exhausted, the vat is killed. For each meter that gets exhausted, the kernel calls the associated Keeper, giving it a chance to replenish the meter if it wishes.

Each time the kernel decrements a meter, it will increment a "total execution" counter by the same amount. This counter will be made available through the metering device, as well as a means to clear it. The goal is to conserve computron credits: they're created when a Meter amount is manipulated by the device, transferred to the kernel when a crank is executed, and then returned to the device when it read+clears the counter.

Open questions:

How is the Keeper called? We want to invoke it right away (so the vat doesn't die because of a complete meter-stack underflow before it has a chance to refill at least one of the meters), so maybe the kernel should push a keeper-targeted message onto the front of the run-queue (rather than the back where they usually go). But we don't want to let this be abused to bypass the escalators for free.
What should we do when multiple meters are exhausted? Probably call all their Keepers, starting with the last.
If all meters are exhausted, should the last Keeper get a chance to save the vat, or does whoever set up the vat need to keep a reserve in that last Meter?

Vat Creation

The vatAdmin vat's createVat() API will be augmented to accept meters/keepers as options. Both are stored in the kernel's per-vat tables.

Meter Manager Vat

We'll associate the meter-manipulating device with a new vat, similar to (timer device, timer vat) pair, or the vatAdmin pair. The manager vat can provide a clean ocap API for doing things with meters (splits, balance queries, merges).

The meter-manager vat is also responsible for the conversion of RUN tokens to meter units. This is a bit beyond SwingSet's reach, so we need to design this feature to be optional. The cosmic-swingset host application will configure the RUN/computron relationship in its bootstrap process.

To support this conversion, the manager vat should provide a refill facet for each meter, to which a holder can send a RUN Payment to replenish the meter. The vat will deposit the RUN tokens into a locally-held Purse, figure out how many computron credits they're worth, then increment the meter's value by that amount.

Later (perhaps periodically), the manager vat will query and zero the kernel's total-execution counter. It will figure out how many RUN this this computron count is worth, and transfer that amount of RUN into some sort of stability-fee Purse.

Open questions:

What should the token-conversion setup API be? SwingSet doesn't know anything about ERTP or the specific token type to use, but I think we can give the manager vat a set of object references which are know to respond to ERTP-shaped calls. We'll need a Purse to collect the refill Payments, and maybe a control object to which a withdraw message can be sent, which returns a Payment with the contents of that Purse. We also need something to control the price ratio of RUN and computrons.
If the price of computrons can change over time, how do we conserve the RUN from refill time to distribute-collected-fees time?
- It'd be easier if there were a fixed price.
- How should we set the price? Probably some governance thing.
- Maybe price changes must be accompanied by more RUN (or can provide a refund) to keep the total computrons across all meters, plus the kernel count, times the new price, equal to the current amount of RUN being held (since changing the price is effectively increasing or decreasing the value of already-purchased metering units).
Do we need separate deposit facets?
Do we need the ability to convert metering units back into RUN? Probably easier if we do not.

Keepers

The role of a Keeper is to get informed when a meter is drained, and then take corrective action. Meters are accessed synchronously, at a low-level (by the kernel), so anything more sophisticated must live in a Keeper or in some vat's object that interacts with one. Keepers are like creditors: they provide funds to make sure an operation doesn't fail (the vat isn't terminated), but they'll have a policy of some sort, whether to refill a meter or let it remain drained (risking vat termination).

Eventually, Keepers might have more options, including suspending a crank (to be resumed later), or perhaps interacting with the scheduling of messages. If we were checking the meter before the crank is delivered, rather than afterwards, there would be a question of what to do if the meter was insufficient: a Keeper might be consulted at that point, and it could choose to refill the meter, drop the message, drop the entire Flow the message was on (if/when we implement Flows), push the message back onto the queue, maybe even suspend the vat until someone pays to thaw it out again.

Keepers and Purses

We're thinking that, for now, we implement Keepers as objects in the manager vat, and we give them a Purse to draw from. If/when their associated meter underflows, the Keeper withdraws enough RUN to fill it back up (if this takes place entirely inside the manager vat, maybe it can all happen in a single crank, just after the meter exhaustion and before the target vat receives any further messages). Whoever supplies this Purse still has access to it, so they aren't irrevocably committing their RUN for use as gas: they can withdraw the remainder at any time.

Perhaps we give the meter-manager vat a widely-held API object that can accept a Purse and create a Keeper around it, with some parameters to control how much it refills the meter. We might have it maintain both a "hot meter" and a "backup meter", filling both from the same Purse but with different refill- or notification- policies.

Initial Computrons

When we get to a proper scheduler, each message on the escalators (or maybe each escalator itself) will be associated with a Meter. When the message is delivered, we should transfer a fixed amount of units from the message/escalator's "scheduling meter" to the vat's "execution meter". This amount should be sufficient to let the vat examine the message and make a decision about whether to proceed or not, ideally after somehow switching to a different meter.

The goal here is to prevent a resource-exhaustion attack in which the attacker just sends a lot of useless messages to the victim vat. If the transferred units are enough to let the defender recognize the uselessness of the message and stop processing, then attacker loses tokens overall, but the defender does not (in fact they may come out ahead).

Vat code may be able to reason about incoming message sends, but cranks are also triggered by incoming promise resolutions (dispatch.notify). This may be difficult for programmers to visualize (do they include decision-making code just after an await too?). And in general, our nascent theories about escalator prioritization of messages are even less developed for promise resolutions.

Switching Meters

To support that "attacker pays" defense, we would like a way to switch meters mid-crank, but we don't have a good theory on it yet. Maybe each message could come with a meter to be pushed (for one crank only) onto the front of the meter stack, and we make an API in which a vat can send a message to itself with this extra meter attached.

Open questions:

What are the right runtime boundaries of a given meter? We can't really sense turn boundaries, and xsnap only currently gives us metering information on crank boundaries, so it'd be easiest if we could split things up on separate cranks.
- adversarial code can schedule as many turns as it likes, so any boundary smaller than "rest of crank" is unlikely to be secure
- Message deliveries which await or .then some Promise are likely to be split across crank boundaries, making it hard to provide a meter for the whole message delivery. Flows would help. I can't think of a way to associate the second (notify-resolution) crank with the first one in a way that lets them both use the same meter.

Zoe API

We might augment the Zoe "instantiate a contract" API to accept a Purse of RUN along with the other arguments. Zoe could then set up a Meter and Keeper, give the Purse to the Keeper, and call the vatAdmin createVat with the meter/keeper pair.

Since the goal is for the contract instantiator to pay the fees, but to earn enough from their own customers to cover them, one idea is to make the Keeper be a party to the contract (give it a Seat), that allows it to request a fee payout each time it needs to refill a meter.

We might augment the Zoe "instantiate a contract" API to accept a Purse of RUN along with the other arguments. Zoe could then set up a Meter and Keeper, give the Purse to the Keeper, and call the vatAdmin createVat with the meter/keeper pair.

I think this should be a RUN payment. Our APIs should never pass purses around.

I think this should be a RUN payment. Our APIs should never pass purses around.

That would be preferable, but

we want the authority to draw on a pool of resources shared among multiple contracts
we want an easy systemic way to keep the contract operating account "topped-up"

Both of those seem straightforward with shared purses.

Also, we must decide how/if refunds can happen. I think we decided that, at least initially, feeding a meter is a one-way street. But that doesn't mean feeding a keeper must also be like that. If the vat you're supporting is terminated, we don't want the funds to be entirely lost. Although I suppose we could make the keeper somewhat more sophisticated and give it a refund() -> Payment method.

we want the authority to draw on a pool of resources shared among multiple contracts

we want an easy systemic way to keep the contract operating account "topped-up"

Zoe already has a model of accepting payments and escrowing the assets. That easily satisfies 1. and 2. is easily satisfied by sending another payment whenever it is needed. I think there's no need to reinvent a new model for a use case that is already covered.

Although I suppose we could make the keeper somewhat more sophisticated and give it a refund() -> Payment method.

Zoe conveniently has a refund model too :)

...

Meters, metercaps

We'll introduce a kernel table that maps meterID to a value (in computrons). The meterID is a kref, and clists will be augmented ...

"augmented" presumes the reader knows the status quo of clist design. I'm a little fuzzy on that. I suppose it's documented in https://github.com/Agoric/agoric-sdk/tree/master/packages/SwingSet/docs , but I'm not sure where to start. Does any of the files in that directory serve as a starting point? Are docs on clists hopelessly out of date? (#2452)

This design seems to provide for fees for executing installed contracts. It doesn't seem to address clients of these contracts; for example, users making swaps. Is that on purpose?

p.s. @dtribble confirmed that yes, this is only one part and another part is still in progress.

@michaelfig and @mhofman had a neat idea to express the "switch to a different meter" operation, given a limitation of one meter per crank. We give vats some primitive that returns a Promise which will only be resolved by a new crank, and we arrange for that crank to be using the new meter instead of the original one. The simplest form would be like:

async getRequest(request) {
  // now on sender's just-enough-to-decide meter
  const { who, nice } = examineRequest();
  if (!nice) {
    return; // don't waste our time
  }
  const newMeter = customerMeters.get(who);
  await chargeTo(newMeter);
  doWork(); // now on per-customer meter
}

where chargeTo(newMeter) is the swingset-provided primitive. It would create a new vpid (promise vref), tell the kernel about it (getting it into the kernel promise table, with the vat as the decider), subscribe to it (which is weird, if you're deciding a promise you don't usually subscribe to it, but the kernel should accept it because deciders can shift around anyways), do a syscall.resolve with some magic extra argument that includes the meter to use, and only then create and provide an actual Promise object to userspace.

That sequence is fussy enough that we might consider adding a new syscall just to establish a resolved promise with a different meter, all in one single event. Or we create a short-lived object, send a message to it (to ourselves) with a meter argument, and wait for the kernel to loop it back to us.

We discussed other ways to express the primitive. We could put a method on the object that represents the meter (so await newMeter.runOn()), or integrate it with E somehow (await E.chargeTo(newMeter)). We have the resolution slot to work with too: E.chargeTo(newMeter).then(xyz => doSomething()) and what should xyz be?

We've talked in the past about how whatever meter is active when a message is sent should be used by the recipient of that meter. In this initial approach (as designed above), each vat is associated with a meter, not each message. It would be lovely if we could say:

messages (things with a method name, as opposed to promise resolutions) are run on a meter provided by the sender
if that code waits on a promise, the meter that was active at the time of the .then will also be used for the callback

But.. we have no way to tell when the .then is called (unless we perform even deeper surgery on HandledPromise). I think the best we can currently do is to sample the meter at the time the (remote) Promise is created, which is either a turn after liveslots creates on to represent promise IDs within inbound arguments, or a turn after E() creates the result Promise for some outbound message send (during the handler invocation). I'm not sure if that's sufficient.

We have the resolution slot to work with too: E.chargeTo(newMeter).then(xyz => doSomething()) and what should xyz be?

The suggestion I was making (I think @mhofman was suggesting similarly) is that a method like E.when() would return something like a ChargablePromise (a platform promise that also has a chargeTo method, to indicate the meter should be switched after the promise resolves and before calling its callbacks):

const value = await E.when(myPromise).chargeTo(newMeter);
// value is the resolution of myPromise
doSomethingWith(value);

E.chargeTo(m) is then just a shorthand for E.when(undefined).chargeTo(m).

The .then usages are like:

// Fire off some promises under separate meters.
E.chargeTo(meter1).then(_ => doSomethingUnderMeter1With(lexicalVariable));
E.when(myPromise).chargeTo(meter2).then(res => doSomethingUnderMeter2With(res));
// Continue synchronously under the original meter.
...

But.. we have no way to tell when the .then is called (unless we perform even deeper surgery on HandledPromise)

As part of the eventual send proposal, we will definitely need the ability to track calls to .then. That's been scheduled for some time, and a partial shim of it may be both necessary for this particular application and useful for the proposal.

As part of the eventual send proposal, we will definitely need the ability to track calls to .then. That's been scheduled for some time, and a partial shim of it may be both necessary for this particular application and useful for the proposal.

I don't remember that. What were we thinking of proposing wrt .then?

I don't remember that. What were we thinking of proposing wrt .then?

IIRC, we needed delegated promises to be aware when they were subscribed to (maybe not precisely which .then there was).

Is this the same issue as why we can't get the ordering correct without platform support? My memory of that issue is that it's because we can't tell when a platform promise is forwarded to another promise. If it's not that, then it still does not ring a bell. Curious!

Is this the same issue as why we can't get the ordering correct without platform support? My memory of that issue is that it's because we can't tell when a platform promise is forwarded to another promise.

That's probably what I was confusing needing .then hooks with.

At one point I was interested in sensing .then so vats could avoid doing syscall.subscribe(). If a vat does E( E(x).foo() ).bar(), then it doesn't care about the resolution of foo(), it just wants to pipeline bar to it. That would remove a dispatch.notify delivery to this vat.

The syscall API still as room for this: liveslots automatically does subscribe on every exported promise, but the kernel is all set to do less work if liveslots stopped doing that.

Today's metering meeting (recorded) examined the idea that each message has a Meter associated with it, and delivery fees would be deducted from this meter. Instead of forwarding a number of tokens from input to output, the output messages would inherit the inbound delivery's meter. There would be limitations placed on the vat's ability to use this inheritance: vattp/comms/zoe would be allowed to inherit the meter, but not contract vats, so contract vats must pay for their own outbound messages. Message deliveries would deduct the message's Meter by some amount based on the size of the message, and then the vat's Meter based on computron usage and syscalls and space usage of the vat code.

Afterwards, @michaelfig and I sketched out an alternative approach:

The cosmos SwingSet module, before passing an inbound message to the mailbox device, computes a fee as a linear (A+Bx) function of the message size, denominated in RUN, and deducts this fee from the signing key's cosmos Bank module account. We call this the "Mailbox Fee". The mailbox device is informed of the fee amount.
Each SwingSet run-queue message has a value associated with it, nominally called the "tip" (however that's not quite the right name), denominated in RUN (although the kernel doesn't know that).
When a dispatch.deliver sends a message into a vat, that vat learns the amount of the tip: the API would be enhanced to dispatch.deliver(target, message, tipAmount).
- We're not quite sure what to do with promise resolutions and dispatch.notify
The vat gets to decide what to do with the tip. One option is to forward it on outbound messages: syscall.send(target, message, tipAmount). Another is for the vat to claim the tip itself (it gets added to the vat's configured meter). Another is to burn it / add to the stability pool. We'd need new syscalls for several of these options.
- To expose this to userspace, we're thinking vatPowers.setTipHandlingDefault(..), and E(target, { includeFullTip: true }).method(arg)
- VatTP and Comms would forward the tip in their (nominally single) forwarded message
- Zoe would forward this to the contract vat
- The contract vat would consume it.
This covers system-wide expenses incurred by the existence of the message, which are probably linear-ish in the size of the message (transcript copies, etc)
The contract vat knows that it will have received at least some minimum (A, in A+Bx) for each message, allowing it to safely perform up to some constant amount of work without fear of insolvency, regardless of whether the message came directly from the user or went through Zoe
To safely do more than A work, the contract should insist upon being called through Zoe, who can take an additional fee from the user's Charge Account
- I.e. any publically-available contract facets should not do more than constant work, and non-constant work should only be done from within Zoe-mediated (Charge-Account-billed) facets
We adjust the A+Bx prices to cover any additional costs, like c-list or kernel-object-table entries. We might use A+Bx+Cy (x= serialized message .body size, y= .slots length) to better capture the kref costs.
The contract vat's meter will be adjusted at the end of the delivery: incremented with the tip (assuming it doesn't forward or burn it with a syscall), and decremented by its consumption.
Syscalls decrement the contract vat's meter too, using some A+Bx conversion formula that includes the size of the syscall.
The vat's meter is denominated in a currency (RUN, but the kernel doesn't know that), and all other costs (bytes, priority fees, space consumption) are converted into this denomination by governance-configurable prices.

The nice thing about forwarding tokens, rather than a Meter from which tokens could be deducted, is that it puts a tighter bound on how much the original sender (user) can be charged (the user will always spend the same+predictable amount, a constant function of the current price and the size of the message). And if the user doesn't have enough to cover the fee, we find out about it much earlier (in cosmic-swingset). And the initial fee can be determined entirely by information available in the mempool, so block proposers could avoid including those messages in blocks in the first place, reducing the amount of wasted work and burned fee tokens.

The API that comes out of this is something like:

controller.setPrice('computron', price) (also slots, c-lists, kernel objects, message bytes, syscall bytes, maybe per-syscall fees??)
- From swingset's point of view, all currency amounts are denominated in an abstract METERING token. In the Agoric chain, this is the RUN token, however we might need higher precision because the kernel will be counting very small amounts: the 6-decimal precision of RUN is probably insufficient.
- This is a kernel-state-changing invocation, and needs to happen at a safe time, so it might want to go through the #720 kernel input queue. The state won't get committed for real until the host application flushes the block buffer.
mailboxDevice.addMessage(sender, body, value) (where value is in METERING)
run-queue entries for messages acquire a value (in METERING)
vat Meters are changed to be denominated in METERING, rather than computrons
dispatch.deliver(target, message, value)
syscall.send(target, message, value)
syscall.claim(value) and/or syscall.burn(value)
- System vats are non-profits: any funds which aren't meant to be forwarded should be added to the stability pool or burned as a way to return their value to the system (we'll ask the economists which is best). The kernel might need a configured account which counts the amount passed to .burn()
- value is conserved over the course of the delivery, with anything left over going to some default location, which depends upon what makes the API easiest
Promises: these are harder because they don't have the 1-to-1 or 1-to-0 fanout of normal messages. I think we'll aim for:
- promise resolutions do not carry value, so syscall.resolve() and dispatch.notify() do not carry value
- external (signed) messages still deduct the mailbox fee, however this is burned (or added to the stability pool) by comms, who is the vat that translates a subset of receive messages into syscall.resolve notifications
- to protect themselves, contract vats should not perform non-constant work in promise resolutions, at least not in resolutions which could be driven/amplified by the user (i.e. a Notifier driven by the user somehow)
Liveslots provides the bare minimum API necessary to expose its control over METERING tokens to userspace.
- Contract vats claim all fees for themselves, so ZCF can use its access to vatPowers to set that as a default, and then userspace doesn't need to interact with fees at all
- vattp/comms/zoe forward the fee for regular messages, so they need to tell liveslots which outbound message should get the value, for which an extra argument to E() could suffice. This will need enhancements to the eventual-send API and HandledPromise, but they're in line with our previous thinking.
- vattp/comms/zoe burn the fee for promise resolutions, and a vatPowers default setting should suffice

The main accomplishment I understood from this conversation with @warner, is that we could have an initial plan for spam prevention to get to Mainnet phase 1, and refine the plan as we go further.

With a minimal amount of cosmic-swingset work (#3752), we can achieve the bulk of the client spam prevention benefits by charging the same computed A + Bx (message size; the allocation+processing-time proxy) + Cy (slots; the clist-entry proxy) fee, and allowing A,B,C to be tunable by cosmos-level governance.

I think it's important to add that SwingSet doesn't need to be aware of or do anything with this fee until at least Mainnet phase 2 (non-Agoric contracts). For now, we could distribute the collected mailbox fees to stakers, as we already do with Cosmos-level, Zoe, Treasury, and AMM fees.

A lot of the other discussion that has happened so far is valuable in terms of suggestions to offer more granular support for balancing fairness between clients, service operators, and the stakers. Since in the mid-term, Agoric is the only service operator, that puts fewer constraints on proposing a relatively simple starting point that can evolve.

Agoric / agoric-sdk