Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

support virtual objects that manage ephemeral state #5759

Open turadg opened 2 years ago

turadg commented 2 years ago

What is the Problem Being Solved?

Changing requirements should be efficient

When developing contracts, requirements may evolve and it helps developers to minimize the effort necessary to implement changes. One such requirement is the durability of data. It can be:

In the Virtual Object Manager (VOM) now transitioning a value from V→D or D→V is pretty declarative. Change a function name or add an option flag. But changing →H or H→ is laborious. See https://github.com/Agoric/agoric-sdk/pull/5736/ commits example.

The resulting code should also be clean.

The movements above resulted in the simple https://github.com/Agoric/agoric-sdk/blob/39fe09285a230a33966b23f3ab3a1f61127f0a64/packages/run-protocol/src/vaultFactory/vaultManager.js#L529

turning into https://github.com/Agoric/agoric-sdk/blob/e4c837a47ab31c31628ff8a1a736aa3a139f0044/packages/run-protocol/src/vaultFactory/vaultManager.js#L557-L561

Representatives cannot reveal GC

https://github.com/Agoric/agoric-sdk/pull/5758 tried to solve the above problems by letting some heap data onto the state object. The problem with this is if that userspace could tell when GC happens because one Representative will have those properties, while later Representatives will not.

Description of the Design

A design that seems to satisfy the above requirements is to continue to use the WeakMap pattern as in https://github.com/Agoric/agoric-sdk/blob/39fe09285a230a33966b23f3ab3a1f61127f0a64/packages/run-protocol/src/vaultFactory/vaultManager.js#L140-L147

But instead of making the module responsible for the map and each method responsible for pulling the ephemera object out of it, have the VOM provide ephemera in the context next to state. So the ugly refactor above could be back to one line,

  getGovernedParams: ({ ephemera }) => ephemera.factoryPowers.getGovernedParams();

TBD how the ephemera object gets initialized. Some requirements:

Security Considerations

Test Plan

warner commented 2 years ago

For initialization, one option would be a breaking change to the API, in which the init function changes from (...args) => initialState to (...args) => ({ state, ephemera }). That's probably the most ergonomic to use when ephemera are in play, slightly worse when they are not (e.g. x => ({ x }) becomes x => ({ state: { x } })), but we'd have to change all the callers. I count 17 files (in zoe, ERTP, and run-protocol) which are likely clients.

Another is adding options.initEphemera = (...args) => ephemera, which gets the same arguments as init. I can imagine folks wanting access to the facets and/or state from that function, and we'd need to decide if it's called before or after the main init, both of which make it somewhat awkward.

mhofman commented 2 years ago

If we're going for API breaking changes, I'd much rather we go for what I proposed in https://github.com/Agoric/agoric-sdk/issues/5170 in which case we'd simply have something like init: ({state, ephemera}, x, y) => { state.x = x; state.foo = makeStuff(y); ephemera.computedX = compute(x); };

mhofman commented 2 years ago

There is an argument to be made about having a separate initEphemera that is lazily called exactly once before first usage in each version so that the ephemeral data can be reconstructed from durable state when necessary.

turadg commented 2 years ago

IBIS for the design options. Please edit this comment to add your points. Let's not bikeshed on the names yet and stick "ephemera" for any object that holds ephemera. String can change once the semantics are figured out. args means the args to the kind constructor and state means what's produced now by initState (@warner notes that initState() returns the initial value for state, but is not the same JS object that will be received as context.state when behavior methods are called).

Requirements:

Cases to handle:

? How should the ephemera object be created?

  1. : initEphemera provided in options as (state, ...args) => E. 1.1 + can be provided as needed when lost 1.1.1 - not without the constructor arguments

  2. : existing initState changed to return ({state, ephemera}) 2.1 - breaking change 2.1.1 . mechanical fix and if it's a better API now's the time

  3. : primary init changed to ({state, ephemera}, ...args) => void 3.1 - breaking change 3.1.1 . mechanical fix and if it's a better API now's the time

  4. finish receives an ephemera to populate 4.1 - shouldn't receive constructor arguments

  5. ephemeral state dependent on parent state comes in a stub durable object that holds it 6.1. . vaultDirector creates factoryPowers as a druable object and vaultManager holds that in durable state

  6. factoryPowers: ({ state, self }) => factoryPowersWM.get(self) 7.1 . vaultDirector provides factoryPowersWM to vaultManager

erights commented 2 years ago
factoryPowers: ({ state, self }) =>  provide(factoryPowersWM, self, () => makeFactoryPowers(state));
warner commented 2 years ago

@Fudco and I walked through the options this afternoon. We found problems with most of the proposals above, and came up with two new ones that we think might work.

The main constraint is that the ephemera needs to be created both in the first version of the vat (at about the same time that the durable object is created, i.e. the vref is allocated), but also in the second+subsequent versions of the vat (either when the durable object is first deserialized, or when a method that needs ephemera is first invoked). The first call is associated with a call to the init() function that creates the initial state, but the second is not. So any proposal that attempts to create ephemera from init() is doomed. That takes out the IBIS proposals 2, 3, and 4 (because finish() is called the same number of times as init()).

It also takes out the portion of proposal 1 that passes ...args to an initEphemera function, because those args (the maker args) are not available in the second+subsequent versions (even if they were durable, we wouldn't want to keep them around in durable state across version upgrades: they're initialization args, not state).

Proposal 5 is kinda flipped around, I was there when we came up with it but I can't parse it well enough to consider. Proposal 6 is close to the "open coded" approach that Chip and I were using as a jumping-off point, which I'll continue here. The following pseudo-code is what a userspace author might do on their own, if the VOM didn't provide any better tooling:


function createVaultDirector(VDstuff) {

  const ephemeraWM = new WeakMap();
  function provideEphemera(vm, state) {
    if (!ephemeraWM.has(vm)) {
      const ephemera = createEphemera(VDstuff, vm, state);
      ephemeraWM.set(vm, ephemera);
    }
    return ephemeraWM.get(vm);
  }

  const init = (args) => initialState;
  const behavior = {
    doFoo({ self, state }, ...fooArgs) {
      const ephemera = provideEphemera(self, state);
      doStuffWithEphemera();
    },
    doBar({ self, state }, ...barArgs) {
      doStuffWithoutEphemera();
    },
  };
  const options = {};
  const makeVaultManager = defineDurableKind(handle, init, behavior, options);

  function createVaultManager(VMstuff) {
    dostuff();
    const vm = makeVaultManager(args);
    return vm;
  }

  return { createVaultManager };
}

In that example, the parent code (VaultDirector) must create a WeakMap and a provide pattern that is keyed by the VaultManager instance (which is a Representative of a durable object). It can create ephemera with access to anything passed into createVaultDirector, stuff you create within the VaultDirector, the VaultManager instance itself (either used to interrogate the VaultManager, or to key some other Store or WeakMap), and contents of the VaultManager's state.

Then, inside every method that wants to use this ephemeral data, it must call provideEphemera(self, state) to get it. The first time this is called within version-1 of the vat, provideEphemera() will take the createEphemera() branch. Every subsequent time within version-1, this will fetch the stored copy from the WeakMap. (If the VaultManager durable object is GC-released within version-1, the WeakMap entry will go away, taking ephemera with it, but that doesn't happen just because any particular Representative for that underlying durable object is GC'ed, so the ephemera continues to consume RAM, making this only suitable for low-cardinality Kinds).

Then, after upgrade, a new WeakMap is created, initially empty. The first time someone talks to one of the old (durable, vref still exists) VaultManagers, a new Representative is created. Following that, the first time someone calls doFoo(), then the version-2 createEphemera will get called again, and create the ephemera for this VaultManager that will last for the rest of version-2 (or until the durable object is GCed, as before).

We think this pattern would work, and could be implemented without VOM changes. But it's not particularly ergonomic. The biggest pain points are:

Keeping in mind the requirement that "Kinds which don't want ephemera should not pay for it" (and high-cardinality Kinds in particular must not pay a RAM cost for it), here's a sketch of what VOM support could look like:

function createVaultDirector(VDstuff) {

  function createEphemera(state) {
    // now use VDstuff and VaultManager's state to create
    // ephemeral stuff for each vaultManager

    // this ephemera can be anything, it doesn't even have to be an
    // object, and will not be hardened
    const ephemera = anything;

    return ephemera;
  }

  const init = (args) => initialState;
  const behavior = {
    foo({ self, state, ephemera }, ...fooArgs) {
    },
    bar({ self, state }, ...barArgs) {
    },
  };
  const options = { createEphemera };
  const makeVaultManager = defineDurableKind(handle, init, behavior, options);

  function createVaultManager(VMstuff) {
    dostuff();
    const vm = makeVaultManager(args);
    return vm;
  }

  return { createVaultManager };
}

The new options.createEphemera triggers the VOM into creating one internal WeakMap per Kind, keyed by the Representative (just like the open-coded form above, using the same VirtualObjectAwareWeakMap that userspace gets, which means it's really keyed by the durable object's vref). The VOM does the provide pattern, using the user-supplied createEphemera() function. Each time a Representative is created, we fetch (and/or create) that object's ephemera, and then add it to the context argument that gets bound into the methods.

By passing it in context to each method, it is available directly (without additional provideEphemera() boilerplate) to all code that wants it. Methods that do not need it (bar()) just don't destructure it in their { self, state } argument code.

ephemera can be anything the user code wants, even a non-object, and it is not frozen or hardened. So user code could create a mutable record and use the properties as volatile state.

In version-1, the user's createEphemera() method will be called during makeVaultManager() (after init() and before finish(), if any). In version-2, it will be called the first time the durable object is deserialized into a Representative (slightly earlier than in the open-coded example, where it doesn't get called until foo() needed it, and bar() did not).

Because createEphemera() is called before the Representative can be created, we cannot provide it with the VaultManager instance (the vm argument in the open-coded example). Imagine if createEphemera() called vm.foo() .. what ephemera would that method get? This might be an imposition on Kind authors: any distinction between the ephemera provided to one instance versus another must come from differences in their respective state contents.

This is probably the biggest limitation of this approach, but it is the price paid for removing the boilerplate and receiving { ephemera } through the context argument.

Within the VOM, the context record ({ self, state, ephemera } or { facets, state, ephemera }) must be carefully constructed: ephemera is not hardened, but the context record is frozen, and all three properties are non-writable and non-configurable. The context object's lifetime is linked to the cohort of Representatives (to prevent a GC sensor). We don't need to establish such a link with ephemera because ephemera is already kept alive by the durable object's vref in the internal WeakMap, so ephemera cannot go away while the durable object exists. Userspace can sense when the first Representative (within any given vat version) is created: just wait for createEphemera to be called. But it cannot sense if/when a second Representative is created: createEphemera is never called again (within that vat version), which means this does not provide a GC sensor.

(It would be slightly easier/safer to build context if we could harden ephemera.. userspace would need to use a Map or Set instead of simple mutable records or arrays, but they could still hold Promises and objects/functions that close over other mutability).

I'll describe the second proposal we came up in a separate comment.

warner commented 2 years ago

Our second proposal is a bit more radical. We realized that we're currently providing three-ish tools, with various values of two orthogonal properties:

low-cardinality high-cardinality
non-durable plain objects-as-closures makeKind (virtual)
durable makeDurableKind

The fourth corner wants a tool for data that is durable but of low-cardinality (so we can afford to spend RAM on each instance). A lot of the singleton Kinds we're building for contract upgrade (ZCF, the contract instance) fall into this category, but some of the friction is because our only durable tool is made for high-cardinality data.

We sketched out a fourth tool, with a strawman name of defineExpensiveDurableKind (or maybe defineUpgradableKind), that woud be used like this:

function makeBehavior(state) { // called once per version, during first unserialize
  // could mutate 'state' here
  let ephemera;
  return {
    doFoo({ self }, ...fooArgs) {
      // can read/write ephemera. 'self' has an identity.
    },
    doBar({ self }, ...barArgs) {
    },
    doBarMulti({ facets }) {
    },
  };

let maker = defineExpensiveDurableKind(handle, init, makeBehavior, options);

This approach would call makeBehavior the same times as createEphemera was in the previous example: during init() in version-1, and during first deserialization in version-2. However the Representatives would be pinned in RAM, never to be released until upgrade caused the vat version to stop. This prevents userspace from sensing GC by counting calls to makeBehavior().

By calling a user-provided function once per instance, we could create ephemera as closed-over variables, available to all behavior functions, instead of passing it through context. state is also closed-over, but is implemented as the same "hardened record of getters/setters for known state properties" as before. init and options.finish behave as before.

Note that self must still be passed through context, because self is not yet defined within makeBehavior (the return value of makeBehavior is not self, instead it is a record of context-taking functions that must be copied/bound into a newly-synthesized object that curries context appropriately).

Also note that makeBehavior does not get access to the args which init() receives, because makeBehavior is called in version-2 (not just version-1), by which time those args are long gone.

The makeBehavior has the opportunity to mutate state as the object is first created and/or unserialized. This might not be a good idea, but on the other hand it might be a great place for schema upgrade to happen.

warner commented 2 years ago

Argh, nope, both options.createEphemera and defineExpensiveDurableKind's makeBehavior run into a problem: we disable metering during deserialization, and both would run user code within that time, allowing userspace to cheat on metering.

We disable it because deserialization might encounter vrefs which refer to virtual/durable objects, whose Representatives may or may not already be in memory (they are tracked with a WeakRef). If we don't currently have a Representative, we must build one, which costs more meter usage than if we skipped it. That would give a GC sensor to anyone watching the meter. To avoid that, we sandwich the marshaller.deserialize() call in a disableMetering block. But that means everything during deserialization, including the call to user-provided createEphemera().

The number of times createEphemera() is called is not a GC sensor, but the flip side is that createEphemera() could do something really expensive, and it wouldn't be captured by the meter. And all userspace activity is supposed to be captured by the meter.

The open-coded approach doesn't suffer from this because provideEphemera is called after deserialization is finished, so it's all in userspace. But that boilerplate is pretty annoying.

mhofman commented 2 years ago

Why not make context.ephemera a getter? That would allow to:

This getter would roughly be equivalent to a provideEphemera implemented in userland. I do think that we should find a way make it easy for userland to implement provideEphemera to get some experience before moving into the the VOM. One option would be to allow the state object to be used in a userland WeakMap key.

Regarding the makeBehavior idea, I am feeling somewhat uncomfortable with it, probably because it reverts back to a closure over state model, and enshrines using the state object outside of calls to a behavior method.

warner commented 2 years ago

Oh, that's clever.. yeah I think making it a getter would address the problems I raised.

Defer call to initEphemera/createEphemera until the first time a facet is used in a given version (or until the ephemera is used by a method of a facet?)

Yeah, if the method is defined as foo: ({ state, ephemera }) => stuff then it'll get created as soon as that method is called, but if the method does foo: context => stuff then it waits until stuff does context.ephemera or { ephemera } = context which might only happen inside a conditional. But all of that is a deterministic function of userspace behavior, and all of it happens after any Representative-creating deserialization takes place, so it'll be metered along with the rest of userspace.

I do think that we should find a way make it easy for userland to implement provideEphemera to get some experience before moving into the the VOM. One option would be to allow the state object to be used in a userland WeakMap key.

Hm, state can already be used as a WeakMap key (it's a regular Object, a bag of getters/setters, without value properties, without vref identity, and with some tricks to make sure it has the same lifetime as the facets it supports), so userspace could write the open-coded provideEphemera approach today. Are you thinking of something in between "just write it yourself" and options.createEphemera? Maybe options.provideEphemera = state => doStuffAroundYourOwnWeakmap ? (or maybe we pass the whole context object in). I'm not sure I see how that's more educational or flexible than having the VOM manage the WeakMap.

Regarding the makeBehavior idea, I am feeling somewhat uncomfortable with it, probably because it reverts back to a closure over state model, and enshrines using the state object outside of calls to a behavior method.

Yeah, needing the state or context to index the WeakMap necessarily means they appear outside a behavior method.

And the "closure over state" model is exactly what it's trying to salvage, for the use case where we're closing over shared (ephemeral) state, and only use state.propname for per-instance (durable) state. All of this is a struggle to retain as much of the "elegant" objects-as-closures model in the face of requirements for high-cardinality and/or durability/upgradability.

The defineExpensiveDurableKind approach would allow a singleton use case that needs a durable identity, but whose durable state is pretty immutable (and easier to manage with baggage and/or reconstructed entirely at upgrade time), to close over everything, and not even have a state. I still don't know if it's a good idea, but it might be the closest we could get to the original objects-as-closures while still enabling upgrade (and preventing the GC sensor).

mhofman commented 2 years ago

Yeah, if the method is defined as foo: ({ state, ephemera }) => stuff then it'll get created as soon as that method is called, but if the method does foo: context => stuff then it waits until stuff does context.ephemera or { ephemera } = context which might only happen inside a conditional.

Yes, and there is also the option to explicitly create the ephemera before calling the behavior method the first time (not during the getter), and prevent calling any facet methods during the ephemera creation. This would be a more heavy handed "guard" to prevent potential footguns (conditionals ephemera init), but might prevent some legitimate use cases (internal behavior facets used for ephemera init).

Hm, state can already be used as a WeakMap key

Oh I thought we explicitly disallowed that.

I'm not sure I see how that's more educational or flexible than having the VOM manage the WeakMap.

It's not, I just wanted to get ephemera experience before moving it into the platform, see if that approach actually solves problems. Basically have userland implement const provideEphemera = (state, facets) => {} using the state object as the WM key, and optionally facets for its logic if necessary.

Yeah, needing the state or context to index the WeakMap necessarily means they appear outside a behavior method.

I still very much would like to "disable" the state object props outside of behavior methods invocation to prevent this pattern. At least in the WM case, using the identity would be allowed.