Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

vref-aware GC design (was: WeakMap + garbage-collected Presences = non-determinism) #2724

Closed warner closed 2 years ago

warner commented 3 years ago

While working on #2660 GC tracking, I observed a Zoe test failure in which some Payment or SeatHandle or something was not found in a weak-store. I'm still investigating that, but it raised a disturbing possibility.

Imagine two (colluding) vats:

// vat A
function buildRootObject() {
  const target = Far('target');
  return Far('root', {
    getPresence() { return target; },
  });
}
// vat B
const known = new WeakSet();
async function one(a) {
  const presence1 = await a~.getPresence();
  known.add(presence1);
  Promise.resolve(a).then(two); // or trigger via timer, etc
}

// time passes.. and maybe GC

async function two(a) {
  const presence2 = await a~.getPresence();
  if (known.has(presence2)) {
    console.log('no GC happened');
  } else {
    console.log('yes GC happened');
  }
}

Now someone calls b~.one(a), and B asks A for the target. B sees the vref for the first time, and creates a new Presence for it, stuffing it in the WeakSet. Now some time passes before B calls its two() method (maybe one turn, maybe an hour passes, if B has a way to get someone else to wake them up later).

If GC has happened in B while no code held a strong reference to presence1, the Presence will get pruned, and the WeakSet will be empty. When two() happens and B re-fetches the target, the vat-B liveslots there will make a brand new Presence object, which won't be in the WeakSet, and two() reports that GC happened.

If it did not happen, the vat-B liveslots will still have the Presence, and two() gets back the same JS object it had before, which will be in the WeakSet, and two() reports that GC did not happen.

(in #2615 terminology, in the first case the vref is in the COLLECTED/FINALIZED/UNKNOWN state at the end of the crank, while In the second case it is in UNREACHABLE. Nothing outside of the JS engine can tell the difference between REACHABLE and UNREACHABLE, and the engine only learns the difference after a full GC sweep).

This will cause intermittently faulty behavior in code which relies upon the JS object identity of a Presence. The vat which exported the object (holding the Remotable) can use its object identity safely, but not a vat which imports a matching Presence.

Worse, this enables vats to behave non-deterministically. They're not supposed to be able to sense the passing of time, or any activity of the garbage collector.

We put a lot of energy into the virtual object code to make sure that adversarial userspace code could not use it to sense GC happening. Code that relies upon the JS object identity of a virtual-object Representative will observe a consistent pattern: each new crank sees a new Representative, which both discourages its use for e.g. WeakMap keys, and behaves deterministically. Code that wants to do the right thing will use our WeakStore (which colludes with liveslots to use the vref identity, 1:1 matched to the exported Remotable) rather than a native WeakMap. But code that wants to misbehave doesn't get the tools to sense GC.

This import-side problem doesn't lend itself to the same solution. We've always assumed Presences have a stable identity, and it's entirely reasonable for folks to want to use them as WeakMap keys. But that only worked because liveslots was maintaining a strong reference to the Presence it created. To allow liveslots to perform GC and drop the upstream exported Remotable when nothing else in the kernel is referencing it, we need liveslots to use a WeakRef to track the downstream import. But by weakening that import-side reference, we allow the Presence to go away, which loses the stable identity that we had previously assumed.

I haven't yet come up with a clever idea to work around this. Worst case, this means we can't do GC until the entire vat is terminated, or some code deliberately (and deterministically) disavow()s the Presence.

@erights @dtribble @FUDCo how did this work in E? Given multiple colluding vats, could one vat learn about its own GC activity?

erights commented 3 years ago

We should talk through how it worked in E and how it is different. Generally, for objects that had access to anything through CapTP, we assumed it had access to non-determinism, and were not trying to encapsulate all non-determinism within CapTP. Also, E's passByCopy objects had no identity. Like JS strings, there was no way to distinguish a passByCopy object from a copy of it.

Nevertheless, it will probably be illuminating to walk through these issues in E.

warner commented 3 years ago

Hm.. was the non-determinism they could sense (via CapTP) limited to arrival-order non-determinism? I was hoping to salvage that here, but I'm not seeing a way to do it, since the retain/drop difference causes messages to be sent from the vat, rather than being triggered by messages coming into it.

erights commented 3 years ago

Good question! Maybe yes. It'll be interesting to examine and compare in depth. Looking forward!

dtribble commented 3 years ago

We shim WeakMap/WeakSet so that either you can't put a virtualObject into one, or it makes the VO stay in memory.

Programmer: it hurts when I do this. Dr. Determinism: don't do that. Programmer: but... Doctored determinism: can't do that!

warner commented 3 years ago

(argh, @dtribble beat me to it by mere seconds. posting anways.)

For vats which wish to behave, we could enhance our virtual-object-aware WeakStore to collude with liveslots about Presences in the same way it handles Representatives: the WeakStore asks liveslots for the embedded vref (o-12) of the Presence, and uses that as a key instead of the JS Presence object itself. That would maintain the identity, although as a WeakStore the lifetime would still be a problem. What can the WeakStore use to know when the value can/should be released, if not JS object lifetime? For Representatives (which represent exports) we know the entry should exist until the virtual object is explicitly deleted, which lasts beyond any individual representative object. But Presences are imports, and as long as the original (remote) Remotable is still known to any vat, this vat might receive a copy, and thus expect its identity to be retained.

And of course that doesn't help protect against adversarial vats at all, unless we restrict access to WeakMap and WeakSet. Hmm...

Ok, we could do that, those are Compartment globals and we can shim them as much as we want. We could give vats a wrapped form which throws if you ever try to use a Presence as a key. Since we withhold WeakRef and FinalizationRegistry from userspace already, by withholding the raw WeakMap and WeakSet, we'd be removing the last tools vats could use to sense the ongoing identity of any objects the wrapper rejected.

We've considered making the virtual-object-aware WeakStore the primary tool userspace code should use. This would raise that one level further, replacing the built-in functions with ones that collude with the infrastructure.

warner commented 3 years ago

Right, so the real "raise you one more" is to modify the global WeakMap to use the same vref-based identity function as the makeWeakStore facility that we add to all vat globals. Then userspace code could go back to using new WeakMap() instead of makeWeakStore(). Except that maybe we prefer the "store" API (distinct init and set methods).

Well, plus we add makeWeakStore only to the vat's initial Compartment.. any vat that uses the Compartment API must be responsible for propagating that global to the new Compartment too. But if we wrap/modify/(or even just delete!) the pre-existing WeakMap/WeakSet objects, we must also apply an inescapableGlobal to make sure all child Compartments get the same wrapped version, not the determinism-threatening original.

michaelfig commented 3 years ago

Right, so the real "raise you one more" is to modify the global WeakMap to use the same vref-based identity function as the makeWeakStore facility that we add to all vat globals.

This is looking good. So liveSlots would patch the vat's WeakMap, WeakStore, and Compartment (with inescapableGlobals).

Then userspace code could go back to using new WeakMap() instead of makeWeakStore().

I'd prefer the global makeWeakStore to go away, and for userspace just to import { makeWeakStore } from '@agoric/store'; which internally uses new WeakMap directly. If there are good reasons not to do this, I'd gladly retreat from this position.

warner commented 3 years ago

That'd work.. both would have the same vref-aware identity function, by virtue of one being implemented in terms of the other, but the external API would be up to the caller's tastes.

It feels kind of invasive, modifying something as built-in as WeakMap, but I guess it fits with the nature of the platform we're trying to provide. It means Presences are special: normal userspace code cannot create something that is treated in the same way, but since Presences are by definition arriving from the "outside", they're going to be special no matter what.

makeKind is still a thing, there's no pre-existing global we could patch to build objects where their state and behavior are provided as separate things. At least I can't think of a pattern with e.g. Object.create that would make sense. I suppose this is where our hypothetical source-to-source transform would be applied. There's a relationship:

raw vref-aware
raw WeakMap our vref-aware WeakMap wrapper
XXX makeKind and invocation of the maker it returns

and the "XXX" box would be a constructor function that returns an object (with behavior) that closes over some per-instance state variables. Our "special vat environment" would be reached through a transform that converts that pattern into calls to makeKind/etc, rather than attempting to modify the environment's globals to induce that behavior in otherwise naieve code.

dtribble commented 3 years ago

It feels kind of invasive, modifying something as built-in as WeakMap, but I guess it fits with the nature of the platform we're trying to provide. It means Presences are special: normal userspace code cannot create something that is treated in the same way, but since Presences are by definition arriving from the "outside", they're going to be special no matter what.

Just to be clear, we don't modify it, we provide a different one in the nested scope. That way the infrastructure isn't paying any overhead for it. Few contracts require them.

warner commented 3 years ago

I'll agree that we're not modifying the WeakMap/WeakSet used by kernel code. But the kernel doesn't use them, partly because it must be deterministic, partly because the kernel is much more explicit about deallocating table entries. There's just not very much within the kernel that would be easier/simpler/more-efficient with weak tables indexed by objects.

I'd disagree that "few contracts require" weak tables. They seem to be used extensively for rights-amplification patterns.

The properties that make them useful for rights-amplification patterns are worth looking at. A WeakMap has four interesting benefits/differences over a regular Map:

The rights-amplification utility comes from the non-enumerability. The weak key and automatic value deallocation is a nice side-effect, and occasionally necessary to avoid accumulating junk, because you might never again get access to the key which you'd need to do a delete.

The special WeakMap we're talking about is non-enumerable, and does not keep a strong reference to the Presence used as a key. We might call it a NonEnumerableMap to be more explicit about what it offers. But cannot do automatic deallocation when the underlying vref goes away, because what even qualifies as the vref going away?

Users expect both Map and WeakMap to be "durable", in the sense that submitting a key now and submitting it again later will both get you the same value. Regular WeakMaps use the JS object identity as the key. We're talking about a tool that uses the "Swingset object identity" as the key. For a Presence that outlives its use in the weakmap, this is equivalent to the vref (e.g. o-11) that was imported into the vat, and also equivalent to the kref (ko4) on the other side of its c-list, and also equivalent to the vref used by the exporting vat's original Remotable (e.g. o+2). It's as if there were a swingset-spanning table, and for as long as that Presence stays alive, the contents of one row are fixed:

exported Remotable exporting vat vref kernel kref importing vat vref imported Presence weakmap value
Remotable-1 o+2 ko4 o-11 Presence-1 value-1

But when we introduce GC, we're saying that when the importing vat drops the last strong reference to Presence-1, the kernel is allowed to drop the c-list entry:

exported Remotable exporting vat vref kernel kref importing vat vref imported Presence weakmap value
Remotable-1 o+2 ko4 (none) (none) value-1

.. and then if someone (perhaps the exporting vat) re-sends that same object, resulting in a new Presence object, that Presence is supposed to be recognized as mapping to the same value as before:

exported Remotable exporting vat vref kernel kref importing vat vref imported Presence weakmap value
Remotable-1 o+2 ko4 o-12 Presence-2 value-1

Moreover, if all importers of the object stopped referencing it, and nothing else in the kernel referenced it (e.g. saved promise resolutions, outstanding run-queue message arguments), then the kernel ought to be able to release the exporter's c-list entry as well, and notify the exporting vat (with dispatch.dropExports()) that liveslots no longer needs a strong reference to the exported Remotable:

exported Remotable exporting vat vref kernel kref importing vat vref imported Presence weakmap value
Remotable-1 (none) (none) (none) (none) value-1

If the exporting vat were to re-export the same Remotable, it would get a new export-side vref, which of course results in a new kernel-side kref, and a new importer-side vref, and a new Presence, and yet somehow we're hoping to retain the same WeakMap relationship as before:

exported Remotable exporting vat vref kernel kref importing vat vref imported Presence weakmap value
Remotable-1 o+3 ko5 o-13 Presence-3 value-1

I don't see how to maintain that kind of durability and also be able to delete kernel table entries.

The virtual-object manager's makeWeakStore knows how to use the vref identity for its Representatives, but that's only safe because those are exports, and virtual objects explicitly live until their creator deletes them (and we haven't provided an API for that yet, as it's tantamount to a unilateral revocation, with downstream consequences that we haven't decided how to deal with yet). The exporting vat's vref is stable for as long as the virtual object remains alive, which is specifically more than the lifetime of any ephemeral Representative.

So if the upstream export is a virtual object, the vref is stable (if the kernel does dropExports, and later the vat re-exports it, it will use the same vref as before). But all of the other columns of that nominal table are not. And if the upstream export is a non-virtual Remotable, even the exported vref is unstable.

Our idea from last night was to use the imported vref as the object key in our special NonEnumerableMap WeakMap replacement. But in the face of re-imported Presences getting a new vref, I don't see how that can work. And knowing that the kref might change (and hopefully will change, because if GC is at all useful, we want to dropExports all the way back up to the exporting vat and release its Remotable), we have even less stability to work with.

two-phase GC

We might regain some stability if our dispatch.dropExports did not actually forget the vref until the vat had forgotten the Remotable. We'd remove the inbound clist entry (o+2 -> Remotable-1), but retain the outbound clist WeakMap (Remotable-1 -> o+2) in case they ever re-exported it.

It might be useful to then use a FinalizationRegistry on that outbound entry and do something in the case that the Remotable went away. Like, telling the kernel that it could drop the kref because it's no longer needed for stability.

We could imagine all four c-list entries (importer/exporter vs inbound/outbound) having two states. One would mean "the object is still accessible" and vaguely corresponds to a strong reference. The other means "I might still need the identity" and corresponds to a weak reference.

If the importer drops their Presence, but its vref is still a key in a NonEnumerableMap, then the importing vat does a "soft drop" of the c-list entry, moving it into the second (weaker but still identity-maintaining) state. The kernel counts the strong references, and when they all go away, it does a "soft dropExport" to the exporting vat. This tells liveslots in the exporting vat to drop its strong reference to the Remotable, but retain the WeakMap from Remotable to vref.

The c-list entries, and the kernel object table entry, would remain until:

Alternatively, if all vats (exporter and importers) have a weak clist entry, but the exporting vat notices that the Remotable has been collected, then now we know that the vref can never again show up in a message: there's no longer any Remotable that could possibly be translated into that identity. The downstream vats' NonEnumerableMap entries are now unreachable. At this point, we'd like to send some sort of revocation message to them, telling them to delete their c-list entries, and also delete the NonEnumerableMap entries for the late vref (to save space by releasing the value).

This reminds me a lot of our analysis on non-deterministic ("formal" vs "local") GC behavior. Instead of formal-vs-local, we've got strong-vs-weak.

I suppose this is really just WeakRefs extended across multiple vats, making both the strong-to-weak and weak-to-none transitions be end-to-end visible.

warner commented 3 years ago

Here's a design that seems sound to me.

Generalities

Each layer needs one more degree of freedom than the layer it supports. To perform GC for a vat that can only hold strong references (or none at all), the next layer down must have access to WeakRefs. To perform GC for a vat that can also hold WeakRefs (in the form of a WeakMap), the next layer down must be able to sense the presence or absence of a WeakRef. To do that, we must forbid access to WeakRef, and modify/instrument the WeakMap/WeakSet they get (forbidding access to the native form).

A "Swingset Object" is the abstract notion of an object, which can cross vat boundaries. Each one has a specific exporting vat (the "owner"; when someone sends a message to the object, the kernel sends this message to the owner). Each Swingset Object may contain some auxiliary data (which can include other objects). Swingset Objects within the kernel are represented by a kref (ko12).

A "JavaScript Object" is the real Object that appears in vats: it is correlated with a specific Swingset Object. The "Presences" that appear when a vat imports a Swingset Object (caused by an incoming message or promise resolution containing a reference to the object), and the "Representatives" that stand in for virtual objects on the exporting side, are both expressed as JavaScript Objects.

Importing Vat States

Each importing vat sees the Swingset Object in one of three states:

The object is in the REACHABLE state if there is a Presence for that object in the vat, or if appears in the state data of any virtual object, or in the value of any virtual table (our WeakStore, or one of the upcoming "virtual collections"). We don't currently have a virtual equivalent to the (strong) Map object (with enumerable keys), but if we add one, objects that appear in the keys would also be REACHABLE.

The RECOGNIZABLE state occurs when none of those conditions are true, but the object does appear as a key in any WeakMap/WeakSet/WeakStore. (If we permitted vats to hold WeakRefs, this would be more complicated). In this state, there is no Presence now, but one might appear later (because of an inbound message from somewhere that is in the REACHABLE state), and the vat might attempt to use that Presence in a table lookup, and the table must return the same data stored earlier.

Liveslots uses a FinalizationRegistry to sense the disappearance of a Presence, and a WeakRef in the slotToVal table to avoid blinding itself to the transition by holding a strong reference. Our virtual objects must be instrumented to track when the object's vref is used in the state data, and our virtual tables track when the vref is used in a value (or enumerable Map-style key). When all three go away, the vat moves the object out of the REACHABLE state. It will move to the RECOGNIZABLE state if a virtual table is still using the object as a key, otherwise it moves to the UNKNOWN state.

When it moves from REACHABLE to RECOGNIZABLE, the vat emits a syscall.dropImport. The liveslots valToSlot and slotToVal tables no longer include the object, however the kernel-side c-list mappings are retained.

If the vat receives a dispatch.deliver or dispatch.notify which references the vref, it will create a new Presence object, and it moves back to REACHABLE. The Presence will encapsulate the same vref as before, so the virtual tables will treat it as the same key as before.

When the object is RECOGNIZABLE, it can transition to UNKNOWN in two different ways. The first is provoked locally, when the last recognition predicate goes away. This can happen if a WeakMap/WeakSet/WeakStore is deleted, and will require us to track it in a FinalizationRegistry, which means instrumenting the constructor functions of all three to sense their creation.

The second is provoked externally, when the kernel sends a dispatch.retireImport. It will do this when the kernel's state for the object moves from RECOGNIZABLE to UNKNOWN, which means nobody could reference the object any more. This won't happen until (at least) the exporting vat can no longer emit the object, which might mean it has been terminated, or it was a virtual object that can no longer be reached.

When the vat performs a locally-provoked transition from RECOGNIZABLE to UNKNOWN, it will emit a syscall.retireImport. This tells the kernel that this vat can no longer recognize the vref. The kernel will delete the c-list entry, and the change may propagate further (perhaps causing a dispatch.retireExport into the exporting vat).

When the vat performss a kernel-provoked transition, it does not need to emit syscall.retireImport, as the kernel already knows what it needs to know (and has already deleted the c-list entry). The vat should react to this by finding every weak table entry with this vref as a key and deleting it, just as if userspace had performed a store.delete(presence) or set.delete(presence). This will cause state transitions for any objects in the corresponding values of those tables. To support this efficiently, we'll need an internal table that maps the vref to the list of weak tables that use it as a key.

If the vat moves an object from REACHABLE directly to UNKNOWN (because the Presence went away and the vref was not used in any weak tables, whether as a key or as a value, nor in the data of any virtual objects), the vat emits syscall.dropImport followed by syscall.retireImport (or we might collapse the two into a single syscall).

importing-vat-states

digraph ImportingVatStates {
 REACHABLE
 RECOGNIZABLE
 UNKNOWN

 UNKNOWN -> importing [label="dispatch.deliver"]
 importing [shape="box" label="add to table"]
 importing -> REACHABLE

 REACHABLE -> unreaching [label="(finalized,\nnot a value)\nyes a key"]
 unreaching [shape="box" label="delete tables\nsyscall.dropImport"]
 unreaching -> RECOGNIZABLE

 REACHABLE -> fullydropping [label="(finalized,\nnot a value)\nnot a key"]
 fullydropping [shape="box" label="delete tables\nsyscall.dropImport\nsyscall.retireImport"]
 fullydropping -> UNKNOWN

 RECOGNIZABLE -> unrecognizing [label="no longer a key"]
 unrecognizing [shape="box" label="syscall.retireImport"]
 unrecognizing -> UNKNOWN

 RECOGNIZABLE -> obituarying [label="dispatch.retireImport"]
 obituarying [shape="box" label="delete from maps\nfree values"]
 obituarying -> UNKNOWN

 RECOGNIZABLE -> importing [label="dispatch.deliver"]
}

Kernel States

Within the kernel, there are five data structures we must pay attention to. All use a "kref' (ko34), although the c-lists are additionally indexed by a vref (o+12/o-13):

The kernel tracks both strong and weak references to each Swingset Object. These references can come from:

The kernel understands the same three states as the importing vat (UNKNOWN, RECOGNIZABLE, REACHABLE), although not necessarily at the same time. An object might be UNKNOWN to one vat but remain RECOGNIZABLE or REACHABLE by the kernel. It might be RECOGNIZABLE to a vat while being REACHABLE to the kernel.

If there are any strong references to the kref, the kernel considers the Swingset Object to be REACHABLE. If there are only weak references, it is RECOGNIZABLE. If there are no references, the kref is deleted from the kernel object table (and appears nowhere else) and is considered UNKNOWN.

Once a kref is created, any vrefs and c-list entries that use it must be retained until it becomes UNKNOWN. Transitioning from REACHABLE to RECOGNIZABLE is not sufficient to delete the c-list entry, because the object's identity must be preserved, just in case the original exporting vat re-exports the object in the future, and some previously-importing vat is prepared to recognize it.

kernel-states


digraph KernelStates {
 UNKNOWN
 REACHABLE
 RECOGNIZABLE

 UNKNOWN -> creating [label="exporter syscall mention"]
 creating [shape="box" label="create kernel state"]
 creating -> REACHABLE

 REACHABLE -> check1 [label="importer syscall.dropImport"]
 REACHABLE -> check1 [label="delete from kernel state"]
 check1 [shape="box" label="remaining strongrefs?"]
 check1 -> REACHABLE [label="yes"]
 check1 -> dropping [label="no"]
 dropping [shape="box" label="exporter dispatch.dropImport"]
 dropping -> check2
 check2 [shape="box" label="remaining weakrefs?"]
 check2 -> RECOGNIZABLE [label="yes"]
 check2 -> retiring [label="no"] 

 RECOGNIZABLE -> check3 [label="importer syscall.retireImport"]
 check3 [shape="box" label="remaining weakrefs?"]
 check3 -> RECOGNIZABLE [label="yes"]
 check3 -> retiring [label="no"]
 retiring [shape="box" label="exporter dispatch.retireExport"]
 retiring -> deleting

 RECOGNIZABLE -> forgetting [label="exporter syscall.retireExport"]
 forgetting [shape="box" label="importers: dispatch.retireImport"]
 forgetting -> deleting

 deleting [shape="box" label="delete kernel state"]
 deleting -> UNKNOWN

 {rank=same; creating deleting}
}

Export-side C-List

A vat exports a new vref by including it in the arguments of a syscall.send, or the resolution data of a syscall.resolve (and maybe in a c-list-aware vatstore operation some day). This immediately kicks the kernel state to REACHABLE, because the run-queue message (for send) or saved promise resolution data (for resolve) will hold a strong reference to the new object. Most such messages will soon cause a delivery into some vat, which then holds its own strong reference as an import.

Over time, importing vats will release their strong reference by calling syscall.dropImport. Likewise, the kernel tables may drop strong references as run-queue entries are processed and retired, resolved promises are retired, and auxdata is deleted. After each such decref, the kernel checks the remaining references: if all the strong references are gone, the kernel moves to RECOGNIZABLE, and sends dispatch.dropExport to the exporting vat to tell it to stop holding onto the Remotable for the kernel's benefit. This is a promise to not send this vref into the exporting vat again (unless/until the vat re-exports it). At this point we remove the kernel->vat c-list entry. But we retain the vat->kernel entry, because the vat is still allowed to mention/re-export the object, and its identity must be retained if that happens. The object is considered RECOGNIZABLE in this state.

From RECOGNIZABLE, three things might happen. First, the object might get re-exported by the original vat, moving it back to REACHABLE. Second, that vat might emit a syscall.retireExport, which means that not even the vat can produce the object anymore (their last internal strong reference was deleted). This moves the object state to UNKNOWN, and any importers are told about the loss with dispatch.retireImport. Finally, the third possibility is that all importing vats lose interest (perhaps they are terminated, or they deleted the last WeakMap using this object as a key). When the kernel senses this, it moves the object state to UNKNOWN, and uses dispatch.retireExport to inform the exporting vat that the object identity is no longer needed.

The kernel interacts with the exporting vat with the following calls:

exporting-clist-states

digraph ExportingVatCListState {
        UNKNOWN
        REACHABLE
        RECOGNIZABLE

        UNKNOWN -> exporting [label="syscall from vat"]
        exporting [shape="box" label="add to clist"]
        exporting -> REACHABLE

        REACHABLE -> dropping [label="kernel moves to RECOGNIZABLE"]
        dropping [shape="box" label="dispatch.dropExport\nremove kernel->vat side"]
        dropping -> RECOGNIZABLE

        RECOGNIZABLE -> losting [label="syscall.retireExport"]
        losting [shape="box" label="remove kernel->vat side\nimporters: dispatch.retireImport\ndelete kernel state"]
        losting -> UNKNOWN

        RECOGNIZABLE -> reexporting [label="syscall from vat"]
        reexporting [shape="box" label="replace vat->kernel side"]
        reexporting -> REACHABLE

        RECOGNIZABLE -> forgetting [label="kernel moves to UNKNOWN"]
        forgetting [shape="box" label="delete kernel state\ndispatch.retireExport\nremove kernel->vat side"]
        forgetting -> UNKNOWN

        {rank=same; dropping reexporting}
}

Import-side C-List

importing-clist-states

digraph ImportingVatCListState {
        UNKNOWN
        REACHABLE
        RECOGNIZABLE

        UNKNOWN -> importing [label="delivery to vat"]
        importing [shape="box" label="add to clist"]
        importing -> REACHABLE

        REACHABLE -> dropping [label="syscall.dropImport"]
        dropping [shape="box" label="remove vat->kernel side"]
        dropping -> RECOGNIZABLE
        RECOGNIZABLE -> ignoring [label="syscall.retireImport"]
        ignoring [shape="box" label="remove kernel->vat side\ncheck kernel state"]
        ignoring -> UNKNOWN

        RECOGNIZABLE -> losting [label="kernel moves to UNKNOWN"]
        losting [shape="box" label="dispatch.retireImport\nremove kernel->vat side"]
        losting -> UNKNOWN
}

Exporting Vat States

The exporting vat is where the Remotable JS Object or virtual object was created and exported in a message (or resolution) to the kernel. This vat associates an exporting vref (o+NN, rather than o-NN which signals a vat import) with the Swingset Object. The vref is allocated when the Remotable is seen leaving the vat (by liveslots, during outbound message/resolution serialization), or when the virtual object is created (whether or not it ever crosses the vat boundary).

Remotable

When the exported object is a real JS Object (marked as Far or Remotable), that "Remotable" might be used as the key in a WeakMap, so it is important that this one Object is kept alive until the kernel and all other vats have given up their ability to reach it. Liveslots maintains a strong reference to it (in the exports Set) until the kernel uses dispatch.dropExport to inform the vat that all other vats (and the kernel itself) have released all strong references to the Swingset object. At that point, liveslots deletes the exports entry (dropping liveslots' only strong refernece), leaving the object in the (externally) RECOGNIZABLE state.

While in (externally) RECOGNIZABLE, the exporting vat might retain other strong references to the object, so it is locally still reachable. If the vat ever re-exports ths Remotable, it returns to the (externally) REACHABLE state, and liveslots must put it back in the exports Set to retain a strong reference on behalf of the kernel and any potential importing vats.

Another transition out of (externally) RECOGNIZABLE is when the kernel abandons its ability to recognize the object, because all other vats have done the same (with syscall.retireImport). The kernel uses dispatch.retireExport to announce this transition. When this happens, liveslots can delete the slotToVat and vatToSlot tables and retire the vref. The Remotable might still be alive, and might still be re-exported some day, but it will use a different vref, and no other vats will be able to tell.

The final transition out of (externally) RECOGNIZABLE is when the exporting vat loses the ability to re-export it. We sense this with the FinalizationRegistry on the Remotable. When this fires, we know that, despite some importing vats still being prepared to recognize a re-export, the exporting vat can no longer generate one. So, just like how a WeakMap entry should be deleted when the key becomes unreachable, we want to inform those importing vats about the Swingset Object becoming un-generatable. After deleting the valToSlot and slotToVal tables, Liveslots uses syscall.retireExport to inform the kernel about the transition.

Both the retireExport transitions leave the object in the UNKNOWN state. In this state, valToSlot, slotToVal, and exports do not have any entries for the vref nor the Remotable, and there is no longer a vref associated with the Remotable. Userspace code might retain a strong reference to it, and it might appear as the key or in a value of a map, but liveslots is unaware that it exists.

exporting-vat-states-remotable

digraph ExportingVatRemotableStates {
 REACHABLE [label="(externally)\nREACHABLE"]
 RECOGNIZABLE [label="(externally)\nRECOGNIZABLE"]
 UNKNOWN

 REACHABLE -> unreaching [label="dispatch.dropExport"]
 unreaching [shape="box" label="exports.delete"]
 unreaching -> RECOGNIZABLE

 RECOGNIZABLE -> revoking [label="finalized"]
 revoking [shape="box" label="delete tables\nsyscall.retireExport"]
 revoking -> UNKNOWN

 RECOGNIZABLE -> retaining [label="syscall.send(Remotable)"]
 retaining [shape="box" label="exports.add"]
 retaining -> REACHABLE

 RECOGNIZABLE -> forgotten [label="dispatch.retireExport"]
 forgotten [shape="box" label="delete tables"]
 forgotten -> UNKNOWN

 UNKNOWN -> exporting [label="syscall.send(Remotable)"]
 exporting [shape="box" label="add to table"]
 exporting -> retaining
}

If a Remotable appears in the state data of a virtual object, that is effectively an export: the Remotable must remain alive an strongly referenced until the virtual object itself becomes unreachable.

Virtual Objects

Swingset's "virtual objects" are designed to allow a vat to avoid retaining one JS Object for every exported Swingset Object. Instead, the vat only has a JS Object for Swingset Objects that are currently involved in a message delivery. All others are kept "on ice", stored only as data on disk, until they are needed.

Instead of exactly one Remotable, each virtual object has zero or more "Representatives", which are JS Objects that hold all the state and behavior that the Remotable would have held. Virtual objects must be defined with a special makeKind function, and new ones are created with the factory function it returns. This factory function returns the "initial representative". Representatives can be sent in message arguments and promise resolutions, they can be used as the key or value of a special WeakStore, and they can be held by the state data of other virtual objects.

Issue #1968 argues that each virtual object should have zero or one Representative, where liveslots uses a WeakRef to ensure there are never multiple Representatives. This is a fine idea, and I think will be necessary to make GC work well. The biggest constraint on this whole design is that we do not expose non-determinism to userspace code.

The virtual object does not go away when there are no Representatives for it. It is kept alive by remote references, the state data of other virtual objects, the keys of enumerable Maps and Sets (which have been augmented to understand their identity properties), and the values of Map/Set/WeakMap/WeakSet/WeakStore tables.

Each virtual object has a single vref associated with it. This vref is allocated the moment the virtual object constructor function is called, and persists until the virtual object is thoroughly gone.

There are four sources of strong references to a virtual object:

The sources of weak references are:

Since we index virtual objects by their vref (rather than some other local namespace and a kernel-facing c-list within liveslots), we ignore a dispatch.retireExport from the kernel. For single-Object Remotables, this would be a signal to retire the vref, and allow a new one to be allocated if/when the Remotable were ever re-exported. For virtual objects, we keep using the same vref forever, and re-exportation means the kernel is re-introduced to the old vref (for which it assigns a new kref).

(In the comms vat, we use a separate local namespace, and a kernel-facing c-list, because the object can also be known to other remotes. Regular vats have only one peer: the kernel.)

When all four sources of strongrefs are gone, the exporting vat can no longer create a Representative for the object, so it can be deleted. Other vats might still have been able to recognize a re-export, but that doesn't matter, since the one such source of re-exports is no longer capable of making one). This occurs when the following conditions are met:

Upon deletion, the vat deletes all the state data for the virtual object (which may trigger more GC activity). Then it sends a syscall.retireExport to inform the kernel (and all downstream vats in the RECOGNIZABLE state) to delete their weak table entries. Finally the vat deletes any weak table entries of its own (which may trigger more GC).

So the components of the exporting vat state are:

which gives us 3*2*2 = 12 states, all but two of which keep the object alive. Note that exportation and add-to-table requires a Representative, so some state transitions can never happen (if there is currently no Representative, we must revive one before we can re-export it or use it in a table).

In the diagram below, the green nodes are the ideal "resting state", in which we're consuming database space but not RAM.

exporting-vat-states-virtual

digraph ExportingVatVirtualStates {
 // Sabc:
 // a: kernel state: S=REACHABLE, W=RECOGNIZABLE, 0=UNKNOWN
 // b: local representatives: R=1, 0=0
 // c: local state has reference: S=1, 0=0

 S000 [label="UNKNOWN\nk: 0, rep: 0, state: 0"]
 S000 -> creating [label="create"]
 creating [shape="box" label="allocate vref\nsave satte\ninitial Representative\nadd to valToSlot"]
 creating -> S010
 S010 [label="LOCAL\nk:0, rep: 1, state:0"]

 S010 -> exporting1 [label="syscall.send(Rep)"]
 exporting1 [shape="box" label="add to slotToVal"]
 exporting1 -> SS10
 SS10 [label="EXPORTED\nk:S, rep:1, state:0"]

 S010 -> S011 [label="store as\nvalue"]
 S011 [label="STORED\nk:0, rep:1, state:1"]

 S010 -> deallocating [label="finalizer"]
 deallocating [shape="box" label="delete state"]
 deallocating -> S000

 SS10 -> SS11 [label="store as\nvalue"]
 SS11 [label="EXPORTED+\nSTORED\nk:S, rep:1, state:1"]

 S011 -> exporting2 [label="syscall.send(Rep)"]
 exporting2 [shape="box" label="add to slotToVal"]
 exporting2 -> SS11
 S011 -> S001 [label="finalizer"]
 S001 [label="OFFLINE\nk:0, rep:0, state:1" style="filled" fillcolor="#f0fff0"]
 S001 -> S011 [label="unserialize"]
 S011 -> S010 [label="value stops"]

 SS11 -> SS10 [label="value stops"]

 SS11 -> SS01 [label="finalizer"]
 SS01 [label="EXPORTED+\nSTORED+\nOFFLINE\nk:S, rep:0, state:1" style="filled" fillcolor="#c0ffc0"]
 SS01 -> SS11 [label="unserialize"]
 SS01 -> SS00 [label="value stops"]
 SS00 [label="EXPORTED+\nOFFLINE\nk:S, rep:0, state:0" style="filled" fillcolor="#c0ffc0"]
 SS00 -> retiring1 [label="dispatch.dropExport"]
 retiring1 [shape="box" label="syscall.retireExport"]
 retiring1 -> deallocating
 SS01 -> SW01 [label="dispatch.dropExport"]
 SS00 -> SS10 [label="unserialize"]
 // SS00 -> storing2 [label="store as\nvalue"] // cannot: need Representative to store
 // storing2 [shape="box" label="set flag"]
 // storing2 -> SS

 SS11 -> SW11 [label="dispatch.dropExport"]
 SW11 [label="RECOGNIZABLE+\nSTORED+\nONLINE\nk:weak, rep:1, state:1"]
 SW11 -> exporting2 [label="syscall.send(Rep)"]
 SW11 -> SW10 [label="value stops"]
 SW10 [label="RECOGNIZABLE+\nONLINE\nk:weak, rep:1, state:0"]
 SW11 -> SW01 [label="finalizer"]
 SW01 [label="RECOGNIZABLE+\nSTORED\nk:weak, rep:0, state:1" style="filled" fillcolor="#f0fff0"]
 SW01 -> SW11 [label="unserialize"]
 SW01 -> retiring1 [label="value stops"]
 SW01 -> SW01 [label="dispatch.retireExport"]

 SS10 -> SW10 [label="dispatch.dropExport"]
 SS10 -> SS00 [label="finalizer"]
 SW10 -> retiring1 [label="finalizer"]
 SW10 -> exporting3 [label="syscall.send(Rep)"]
 exporting3 [shape="box" label="add to slotToVal"]
 exporting3 -> SS10
 SW10 -> SW11 [label="store as value"]

 {rank=same; retiring1 deallocating S000}
 {rank=same; S010 S011 S001}
/* {rank=same; SS11 exporting2}*/
 {rank=same; SS10 SS11 SS01}
 {rank=same; SW10 SW11 SW01}
 {rank=same; SS00 exporting1}

}

Overriding WeakMap/WeakSet and Map/Set

To maintain determinism, we must prevent userspace code from sensing the comings and goings of ephemeral Presences and Representatives. Each time a Presence is GCed and then re-introduced, or a Representative is abandoned and then revived, the JS Object available to userspace code will be different, even though it still represents the same Swingset Object . If this JS Object is used as the key of a WeakMap or WeakSet, then userspace can compare newer versions against older versions without retaining a strong reference (which, since liveslots gets to have WeakRef, could otherwise be used to solve the problem).

So we must withhold the real WeakMap and WeakSet from userspace (with the "inescapable globals" feature of importBundle). We should replace them with forms/wrappers that are keyed by the Swingset Object identity, instead of the JS Object identity.

For WeakMap, I think the algorithm looks like this:

We need something more complicated for Map, because the keys are enumerable in insertion order. We may need to create some sort of proxy object for each key, with a pair of mapping tables between them and the real objects, and then store only the proxy in a real Map.

Both our Map and our WeakMap must survive subclassing to the extent that the originals do.

Our Map/WeakMap should sense when a virtual object is being used as a value, and merely store the vref. Then any calls to .get() (or .values()/.entries() for Map) would need to revive the value (if there isn't already a Representative available).

Stages of Implementation

Modulo the nightmare of upgrade, we can implement this in stages. Each stage must be safe against nondeterminism, but we can progressively enable more garbage collection (and achieve more potential space savings) as we go.

We might be able to defer the dispatch.retireImport message for a while, and not automatically free WeakMap entries keyed by these ephemeral objects. Related to that, our replacement WeakMap could temporarily just store the Presence/Representative in the private Map instead of computing a vref for them, which would happen to keep the Presence/Representative alive until/unless userspace explicitly deletes it. This might let us deploy the replacements before finishing the underlying tracking tables.

One of the last steps is to track the existence/release of weak tables. When a WeakStore is released, all of the keys and values are released too. Initially we don't need to recover space when this happens, because we expect most of them to be very static: created when the vat is first launched, and held forever. To avoid an explosion of complexity, we need to make sure these tables themselves are either REACHABLE or UNKNOWN, never merely RECOGNIZABLE. No virtual virtual tables.

dckc commented 3 years ago

I'm struggling to see how the RECOGNIZABLE state comes about. Is a brief example easy to provide?

Chris-Hibbert commented 3 years ago

The third paragraph in Generalities ends in a dangling clause.

warner commented 3 years ago

The third paragraph in Generalities ends in a dangling clause.

fixed, thanks

warner commented 3 years ago

I'm struggling to see how the RECOGNIZABLE state comes about. Is a brief example easy to provide?

In an importing vat, it'd look like:

const wm = new WeakMap();
export function buildRootObject() {
  return Far('root', {
    rememberBob(bob) { wm.set(bob, 'recognized'); }, // 'bob' Presence is not retained
    isThisBob(query) { return wm.get(query); }, // 'query' might be a new Presence
  };
}

If a caller invokes rememberBob(bob), waits a while, then invokes isThisBob(bob), the vat will be in the RECOGNIZEABLE state during the interval.

If GC happens promptly, the bob Presence will be collected right after rememberBob is called, because there is no way for this vat to produce a reference to Bob any longer (you can't enumerate the keys of a WeakMap, and bob is not stored in any other global or closed-over variable).

However, isThisBob is expected to keep working. In the simple non-swingset single-universe case that developers are used to, then either someone else still has the ability to send Bob in the isThisBob() message (meaning they have a strong reference to the original Bob, which keeps the WeakMap entry alive), or nobody does (so Bob was GCed, which deletes the WeakMap entry because it can never again be retrieved). In our distributed swingset-enabled case, we must distinguish between us having the ability to emit Bob, and somebody else (but not us) having that ability. In this latter case, our vat can still recognize an incoming object as Bob even though we can no longer emit it ourselves (and won't be able to, unless/until someone sends it back in: the "re-import").

Our vat is interested in two space-saving transitions from that point. If it stops being able to recognize Bob (say, the WeakMap was deleted), then we can tell the kernel that we no longer need the c-list mapping. If all importing vats stop being able to recognize Bob, and the kernel doesn't reference Bob anywhere itself, then the exporting vat's c-list entry can be removed. The other transition is if the exporting vat (and the kernel, and all other strongly-importing vats) all stop being able to produce Bob. In that case, our importing vat should delete its WeakMap entry, just as if the non-swingset world dropped the last strong reference to Bob. These two transitions correspond to the importing vat sending syscall.retireImport, and the importing vat being sent dispatch.retireImport.

From the kernel's point of view, an object is in the RECOGNIZABLE state if:

In this state, the kernel is waiting for the exporting vat to re-export it (i.e. isThisBob(bob)), which makes it REACHABLE again for a while. Or, the exporting vat might discover that it can no longer export it again, and it notifies the kernel of this transition with syscall.retireExport), and the kernel state moves to UNKNOWN as it informs all downstream importing vats with dispatch.retireImport.

dtribble commented 3 years ago

First, this is a great write-up!

In the section `"Overriding WeakMap/WeakSet and Map/Set", why do we need anything special for non-weak collections. The strong collections are enumerable, but the weak ones are not, so I would not expect "ee need something more complicated for Map, because the keys are enumerable in insertion order" "

erights commented 3 years ago

It is indeed a great write-up! We need to distinguish between kinds of non-determinism.

The adversarial issue is that we cannot let an adversary cause validators to diverge. Under our current assumptions, we can assume that all validators see exactly the same gc behavior, so sensitivity to differences in gc behavior is not an opportunity for that adversary.

Then there's the non-determinism for the sake of robust programming against accidents. Here, sensitivity to differences in gc behavior is a threat to the predictability and robustness of the non-malicious programmer's code. The threat here is that the same abstraction, executed at different times and in different states of the world, will behave subtly differently depending of the happenstance of gc behavior. For this threat, we need to be extremely insensitive to gc behavior but we do not need to be totally insensitive to gc behavior. There may be some cases where a difference is observable if you really look for them, but which benign code is highly unlikely to be sensitive to.

The reason I bring this up is the issue of representatives --- whether there and any number for a given object, with observably different JS identities. We may not need full non-observability of the consequences of GC for these. I think this affects both ways of doing representatives, but I'm not sure.

warner commented 3 years ago

@dtribble is right, non-weak Map/Set can hold its keys strongly as usual, and if we can make #1968 work and guarantee a single stable Representative for virtual objects, then Map/Set can use the actual JS Object as the key, without any special treatment to make it use the vref / "Swingset Object". There's a RAM savings we could pull off if the value could be regenerated when necessary (if it were a Presence, Data, or virtual object), by storing the whole entry in the DB instead of the Map, but we could just as easily declare that Map is not for high-cardinality datasets and keep everything in RAM.

We must still change WeakMap to make it safe against the GCed Presence case. The laziest thing would be to actually hold Presence or Representative keys strongly (which would let us avoid doing any fancy mapping to vrefs), but that would block all GC: it would prevent us from sensing when the Presence became unreferenced by anything other than the WeakMap. The next-least-lazy thing would be to use the design mentioned earlier (an overlay of a vref-keyed Map and a non-(Presence/Representative)-keyed WeakMap for all the ordinary objects), but hold the values strongly. That would have the RAM usage expected from a regular WeakMap: even if the value were a virtual object, its storage would be kept in RAM until explicitly deleted or the key goes away. This would preclude using a plain WeakMap for high-cardinality Presence-keyed data, but it would be simpler than paging in the values on demand, and less fragile in the face of values that can't be serialized trivially.

The rules would be:

We might still be able to have @agoric/weakstore use some magic to page out the values and save some RAM, but it might require some more detailed rules (maybe a schema, maybe limiting the values to only hold Data so we know it's safe to serialize).

warner commented 3 years ago

Ok so I think the full set of APIs for this will be:

vat to kernel

When a Presence (which hasn't been used as a WeakMap/Set key) is finalized, vat emits both dropImport and retireImport. If the vref is still used as a WeakMap/Set key, the vat only emits dropImport, and we defer retireImport until the WeakMap/Set is finalized (the other way it could be retired is for the entry to be deleted from the WeakMap/Set, but that requires a Presence, which means we'll be back in the first scenario).

When a Remotable is finalized, vat emits retireExport.

When a Representative is finalized and the vref is not used as a WeakMap/Set key (nor as a virtual collection value, once we implement those), the vat emits retireExport.

I'm going to start with three separate syscalls, each of which takes an array of vrefs as its only argument. It will be pretty common for the vat to tell the kernel about a vref in dropImport and then immediately use the same vref in a retireImport (this happens every time an otherwise-unrecognized Presence goes out of scope). Another possibility is a single syscall.gc(dropImportVrefs, retireImportVrefs, retireExportVrefs), but 1: the syntax is awkward, 2: since vats have the ability to make multiple syscalls before they finish, even if we give them the ability to do everything in a single syscall, the kernel should continue to amortize the sweep by deferring any refcount-tracking until the crank is entirely done, so providing a single do-everything syscall doesn't relieve the kernel of that obligation. There are situations that will cause a single vref to appear in just dropImport, just retireImport, or both, which means it'd also be awkward to use a syntax that e.g. maps vrefs to one of three "drop modes".

kernel to vat

In this direction, we have more pressure to merge information together, because the vat cannot wait for the kernel to perform multiple operations before performing a sweep (or whatever). I'm still going to start with multiple dispatch operations, because it's simpler, and I can't yet imagine what the optimal pattern will be. We'll probably be doing more vat-side sweeps than necessary for a while until we figure out something better. The most likely second approach will be dispatch.something(dropExportVrefs, retireExportVrefs, retireImportVrefs). Once we get this all working, I'll look carefully at the things that could provoke kernel GC, what the outcomes of those provocations would be, and then what sets of dispatches are likely to occur as a group.

What tools to use for what

A Virtual Object provides RAM savings when you have high cardinality (lots of them) but only a few are in active use at any given time, so most of the time there are no Representatives. The RAM usage is num(active) times the size of the data needed for any given one.

Our replacement WeakMap will hold the values strongly. So storing a Representative as a value will pin it in RAM, and you won't have any memory.

Our replacement WeakSet (assuming we don't just simulate one with a WeakMap that has 1 as the value) should be suitable for simple high-cardinality rights-amplification patterns where all you care about is whether a given object is in the set or not, e.g. issuer.validatePurse(allegedPurse). The membership check will just be looking up the vref in an offline vatStore table.

More complicated rights-amplification patterns, where you care about a value associated with each virtual-object key, would not be suitable for our basic replacement WeakMap, because we'd be keeping high-cardinality values in RAM. We could conceivably augment our WeakMap to serialize the value and keep it offline, but I think instead we'll build a specialized "WeakStore" that requires its values to be pass-by-copy, and its keys to be virtual (or some sort of self-ish pass-by-reference value), so that we can push all of it offline, and aren't required to keep any O(count) RAM around.

warner commented 2 years ago

This is complete now:

We don't yet collect cycles between virtualized data, but that doesn't threaten determinism.