spacebank: vat storage rental economics

warner commented 3 years ago

What is the Problem Being Solved?

Our metering plan includes counting allocations during a crank, and limiting them in the same way as (and perhaps sharing units with) computation. Code which doesn't take up much CPU, but does allocate a lot of memory, should be terminated before it can impact other vats/cranks.

But this doesn't account for the long-term cost of storing a vat's static state. Each validator must spend disk on a copy of the vat state, for as long as that vat is active. There are also some (larger?) number of follower/archiving nodes which spend the disk but who don't receive any of the execution fees (staking rewards). To enable efficient allocation of this relatively scarce resource, we'd like to subject it to a market mechanism.

Ethereum simulates this by charging the transaction that increased storage needs a signifcant amount of "gas": 20000 gas per non-zero SSTORE word (256 bits). At current average gas prices (maybe 100 Gwei/gas, and about $1600/ETH, so 160 u$/gas) this costs about $0.10 per byte stored. To capture the idea that freeing storage will reduce external costs, doing an SSTORE that makes the target become zero (compressed away by run-length encoding) will refund 15000 gas. This is a local refund: it reduces the gas consumption of the current transaction, but not past zero, and thus cannot actually increase the caller's balance. This feature is (ab)used by various "gas storage contracts" which fill storage with junk data when gas prices are low. Later, when a customer wants to execute an expensive transaction, they pay the storage contract to delete data within the same txn, and take advantage of the refund to fund their primary operation. While that's both a clever hack and a horrible pathlogy, it doesn't capture the time cost of the intermediate storage.

We haven't put a lot of thought into this yet, but we'll need something, and it clearly calls out for a descendant of the KeyKOS "Space Bank" metering mechanism. Vats could be charged some number of tokens per byte * second (or byte-cranks or byte-blocks, some notion of spacetime). At the end of each crank, when we force GC (to promptly release unused objects), we can ask the XS engine for the size of the remaining heap, in bytes. We then nominally deduct that value from some counter each unit of "time".

Three big questions that come out of this: who pays for it, when, and what happens when the counter underflows?

In one sense, the crank that caused the heap to grow should be responsible for the costs. But that crank doesn't know how long the space will remain allocated. Pre-paying for a subscription only makes sense when the payer is also benefiting from the subscription, and the crank which consumed the space might be operating on a different budget than the future one that benefits from it.

So it may be easier to think about if each vat is associated with a meter. Various parties could top up the meter, but each (nominal) block, the meter is decremented by the currently-used space. If the meter reaches zero, a strict interpretation would suggest the vat ought to be terminated immediately: no rent, no service, boom. In practice, this could be pretty traumatic, so it might work better to suspend operation of the vat (no deliveries) until someone refills the meter, and only delete the vat entirely if it remains unpaid for a significant amount of time.

An obvious optimization would be to only record the last-decremented-at time, and the last-measured heap space. Then, when a message is about to be delivered to the vat, multiply the two and decrement the meter, update the last-decremented-at time, and possibly suspend the delivery if it underflows. Once a month, trigger a sweep of all vats, terminate the ones that were underflowed last month, and update all their meters. We need to catch both active vats that are receiving messages and idle ones which have not been touched for a while.

Or, if the rates are low enough (and disk space is cheap enough that we can afford some coarse accounting), we don't do anything special per-crank. We just do the full sweep once a month, suspending or terminating vats when the "rent is due" and not in between.

cc @btulloh @dtribble @erights

warner commented 3 years ago

@dtribble pointed out that we should add the size of the vat's c-list to the rent, so retaining references to off-vat things has a non-zero cost

dckc commented 3 years ago

@phoddie I thought perhaps snapshot / restore would reclaim garbage space but a small experiment suggests not. Is that by design?

@warner you mentioned that the way XS doesn't free memory until the xsMachine goes away might have billing implications. That reminded me...

@erights writes May 2:

... if we snapshot and then restore from snapshot, does the restored one perform as badly as just continuing would have? I would suspect that snapshot and restore at least defragments memory.

In a simple test, it doesn't reclaim (much) memory:

    ℹ 42074144 initial allocation
    ℹ 172097664 allocation after 24 growth generations
    ℹ 172097664 allocation deleting growth, gc()
    ℹ 172097568 allocation after snapshot, restore

-- https://gist.github.com/dckc/07db03398ca408e457657aeba8773d54

phoddie commented 3 years ago

Creating a snapshot does not force a garbage collection. Doing so would be observable (WeakRef).

Garbage collection creates free space within the allocated XS heaps. It does not resize the XS heaps themselves. Consequently, the amount of host memory (malloc) will not change significantly, apart from garbage collection of SharedArrayBuffer instances, which uses memory allotted with malloc.

dckc commented 3 years ago

I guess I had in mind defragmentation of XS data structures so that less OS heap was used, not JavaScript level gc. In any case... another idea...

Could gc() return the number of bytes being used, modulo auxiliary host object data? Or can you think of some other way that we could measure that a bunch of JS objects were allocated but are no longer reachable?

@warner if the contract programmer let go of a bunch of objects but XS doesn't return the space to the OS, the validator still has to devote RAM to the contract. Is this a problem?

phoddie commented 3 years ago

Could gc() return the number of bytes being used, modulo auxiliary host object data?

Sure. To get an idea of what is possible there, try a couple things:

In xsMemory.c set mxReport to 1 (line 41). That will output some statistics each time the GC runs.
Look at what instrumentation outputs in xsbug for projects running under the Moddable SDK.

More-or-less the information shown in those two places could be propagated out to a script. Once we have an understanding of what you would find useful, we can think about how to provide it.

zarutian commented 3 years ago

Dunno if I should butt in or not. Here goes though: Some years ago I was looking at DonutLab and silkSec and thinking about the issue regarding compute cost, storage cost, and transmission cost. Decided on a small dual stackmachine with a c-list to others such as the basic component-unit of running computation, call it a thimble. Cost was then calculated in bytes (per) seconds as follows: c-list and memory were measured in byteseconds, one MebiBytesSecond could pay for one KibiByte of memory for one kibisec. Each c-list entry was counted as one byte for these purposes. The execution cost was calculated as bytes fetched from memory to feed the instruction register. Transmission cost counted the caps, bytes, and byteseconds. What happens to thimble that ran out of byteseconds? It gets terminated and deleted, which is perhaps too harsh. Another policy I thought of was akin to wills. Basically to reserve some byteseconds so that the entire memory and c-list could be transmitted to a hier.

Feel free to mine this for ideas.

warner commented 3 years ago

@FUDCo and I were talking today about the deterministic handling of space-usage metering. We're concerned about how the apparent memory usage of a vat could vary due to things beyond the declared deterministic inputs.

These days, we're trying to be tolerant of organic (non-forced) GC, and exclude the consequences of GC from metering. We're going to have an explicit dispatch.bringOutYourDead (#3767) delivery that allows/forces GC to happen, and if there are any memory-usage variations going on, this delivery would be a fine place to consolidate them. We're particularly worried about the memory consumed by GC operations.

Our current thought is:

Instrument XS to count slot allocations as they happen (each call to fxNewSlot). Call this the "increment". Each delivery will return the increment alongside the computron count.
Also instrument XS's garbage collector (probably fxSweep) to measure the number of used slots just after GC finishes. Call this the "usage". The gc() function that liveslots uses to force GC should return this number. The new dispatch.bringOutYourDead() delivery will return it to the kernel.
The kernel tracks a value named charged for each vat, which represents how much space the vat is being charged for. It is nominally denominated in bytes (although it is fed by a count of slots, and there are probably 32 or 64 bytes per slot). In the space * time equation, this provides space.
Each time the kernel calls dispatch.bringOutYourDead, charged is reset to the latest usage.
Each time the kernel performs other deliveries, charged is incremented by that delivery's increment.
We figure out something similar for slab (aka chunk) allocations, which XS uses for strings and arrays.

The increment should be deterministic (i.e. a deterministic function of userspace activity): organic GC does not (should not?) ever call fxNewSlot (TODO: look at the implementation of finalizers and make sure this is true). When we disable metering of GC-sensitive code within liveslots, we should also disable the fxNewSlot code which increments this counter, or record/reset its value around the "unmetered box".

The usage value should be deterministic, but only available immediately after a GC.

The actual number of used slots will vary: when e.g. an object is created, and it attempts to pull a slot from the free list, it might claim an existing one, or it might find none and trigger organic GC, allowing it to reuse a reclaimed one. The memory usage reported by malloc will change (increase) spontaneously when organic GC needs more slots, making it unsuitable for reporting to a consensus-sensitive spacebank metering algorithm.

The charged value produced by this approach will vary in a sawtooth pattern: growing with each delivery, then dropping down to the "correct" value after bringOutYourDead. The vat will be charged for more than their actual usage, especially if bringOutYourDead is called infrequently.

If we establish a hard ceiling on increment, we can protect against excessive memory usage in a single delivery. We're counting allocations, which is different than measuring currently-used, so e.g. a long loop which allocates and then immediately drops an object will look as expensive (in the moment) as a single large allocation which is retained beyond the end of the delivery. But a simple memory-exhaustion attack (doubling a string until it consumes all swap space on the computer, triggering the OOM Killer) would exceed the max-increment first.

We also establish a hard ceiling on charged (perhaps the same as increment). A vat which allocates and retains a buffer in each delivery, accumulating memory usage over several cranks, will hit the ceiling just as thoroughly as if it performed the whole allocation at once.

The colorful analogy that @FUDCo and I came up with was:

You store a lot of junk at a storage locker facility, and they want to charge you by the pound.
A lot of your junk is in the form of dry ice, which sublimates unpredictably.
The easiest way to manage their costs is to weigh your truck as you drive in, and increase your monthly bill by whatever you bring into the facility. There are multiple exits from the parking lot, so they don't bother weighing your truck on the way out.
You pay the bill every month. If you paid for 10 pounds in January, and added 5 pounds on Februrary 2nd, your February bill will be for 15 pounds. If you don't add anything else, your March bill will be for 15 pounds too, and every bill thereafter until December.
Once a year, they jack up the entire building and plop it into a gigantic scale, and weigh the current contents of your stuff. This is expensive but accurate. They reduce your January bill to the measured value, and you keep paying the new reduced+accurate monthly fee until you arrive to add more junk.

The vat owner might want to trigger a new measurement, to bring their charged down to a lower value. This is relatively expensive, like a delivery, so maybe the owner should be charged for it. They could use a simulation of the vat to determine the cost-optimizing frequency/schedule of bringOutYourDead calls.

FUDCo commented 3 years ago

A couple of additional thoughts:

increment, charged, and usage are denominated in abstract storage units of some kind. This might be bytes, or it might be something else, but all we need to do is connect these units to a price; we don't actually need to pin them down precisely as long as we know the upper bound sufficiently well to ensure that the process will never experience a malloc failure (or rather, that we will kill it before it reaches a point where actual memory exhaustion could happen non-deterministically). In particular, different sources of allocation (e.g., fxNewSlot vs. whatever is used for strings and such) might actually be inconsistent with each other in terms of actual quantity of RAM, but as long as the accumulation of charges is deterministic and reasonably well correlated with actual usage we don't really need to care that much. They are very analogous to computrons in this respect.
It strikes me that a vat could be configured to trigger forced GC (aka bringOutYourDead) at the end of any crank where charged exceeds some configured threshold. This could ensure deterministic, periodic-enough GC without requiring coordination via some kernel policy, which might be useful for vats whose allocate/release behavior deviates from the typical case.

warner commented 2 years ago

Incidentally, I found that XS tracks the number of "slots" in use, in the->currentHeapCount, which is updated (reset precisely) at the end of the GC sweep process, and also incremented during fxNewSlot() when a new one is pulled from the free list (both in xsMemory.c). This feels like just what we want to measure, but it may be tricky to decide when precisely to sample it. bringOutYourDead does a forced GC, but then does more JS-side work to drop/retire imports and exports, which will accumulate more garbage. Maybe we want bringOutYourDead to do two GCs, like this:

force GC
(let finalizers run)
compute drops and retirements, perform resulting syscalls
force GC, capture the->currentHeapCount before JS code gets to run again
JS regains control, reads captured currentHeapCount, return results

In this approach, the reported heap count would not include the space consumed by the last bit of (liveslots) code execution, which is probably better than reporting the stable value (immediately after GC before JS gets to run again) plus some extra number of allocations that depend precisely upon when the value gets sampled.

To implement this, we'd want to modify xsnap-worker.c in the xs_gc() function, to copy the->currentHeapCount into some file-local variable just after calling xsCollectGarbage(). Then we'd either add a new host function to expose this to the supervisor (where it could be incorporated into the return value of dispatch.bringOutYourDead()), or we modify fxWriteOkay to include the number into the returned metering results (next to the->allocatedSpace, which measures bytes and includes the headroom, unlike the->currentHeapCount which measures slots and does not). The drawback of the first approach is that it involves more JS-level code (and makes metering information visible to JS, which was not the case before). The drawback of the second approach is that the returned value is only meaningful when the vat has done an xs_gc() recently, and we only intend to do that within bringOutYourDead, not other deliveries.

dckc commented 2 years ago

combine approaches a little bit? have a flag on messages delivered to the worker process that says "I want currentHeapCount in the metering results for this crank"? or just have xs_gc() set such a flag?

dckc commented 2 years ago

currentHeapCount is a nice find! so much so that it makes me wonder: does the->allocatedSpace even belong in the metering results from fxWriteOkay? consensus is sensitive to everything recorded there, yes? And while the->allocatedSpace is currently deterministic, is this something we want to be part of the vat contract? It is directly relevant to the operator's cost to run the vat, but I don't think it decreases when contract code releases heap objects... does it go down on snapshot/restore? not any more, right? Do we want to penalize contracts for high water mark memory usage?

warner commented 2 years ago

Yeah, those are great questions. the->currentHeapCount grows with each tiny allocation, and shrinks back down to the actually-reachable size upon GC. So it will be sensitive to the timing of GC (organic or forced), plus every tiny bit of JS code that runs which might allocate space (and I'd bet stack frames fall into that category).

the->allocatedSpace shows the size of the mmap backing slab which the xsnap process has received from the host operating system, so it should track very closely to the /proc "VmSize" value (modulo some small overhead, plus whatever mechanism is used for the "chunks" that back Strings and Arrays). It can only grow, since XS doesn't ever give memory back to the OS. It can grow during any call to fxNewSlot that finds an empty free list, which means it can grow at all the same points where currentHeapCount could grow.

As you say, allocatedSpace is most directly relevant to the operators: if a vat consumes 100MB and then releases it all, allocatedSpace will remain at 100MB, the xsnap process will continue to mmap it, and it will continue to count against the OS's committed memory pages, so I can see an argument for charging the vat for 100MB even if the->currentHeapCount has dropped to something much smaller. OTOH if the vat isn't using all of that backing store, the OS could page it out pretty easily, which does effectively reduce its cost, and charging the vat according to currentHeapCount would provide an incentive to reduce memory usage, whereas using allocatedSpace obviously can't provide such an incentive.

The metering results are not part of consensus by default: we copy specific values out and use them in consensus-critical ways, but extra values are ignored (we were ignoring gcCount and the various "how deep are the hash buckets" metrics that have since been removed). I can imagine debugging/diagnostic uses for extra metering numbers that we don't copy into consensus-critical code, but certainly whatever form of memory tracking we use for spacebank rental will be part of consensus and needs to be something we can manage safely.

dckc commented 2 years ago

I just ran across an interesting precedent in Storage Rent Economics | Solana Docs:

Method 1: Set it and forget it

With this approach, accounts with two-years worth of rent deposits secured are exempt from network rent charges. By maintaining this minimum-balance, the broader network benefits from reduced liquidity and the account holder can rest assured that their Account::data will be retained for continual access/usage.

Method 2: Pay per byte

If an account has less than two-years worth of deposited rent the network charges rent on a per-epoch basis, in credit for the next epoch. This rent is deducted at a rate specified in genesis, in lamports per kilobyte-year.

For information on the technical implementation details of this design, see the Rent section.

Agoric / agoric-sdk

spacebank: vat storage rental economics #2631

What is the Problem Being Solved?