Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
326 stars 206 forks source link

revoke a Remotable #2070

Open warner opened 3 years ago

warner commented 3 years ago

What is the Problem Being Solved?

We've tossed around the idea that the creator of an object could be allowed to "revoke" it, meaning that (at least) all future messages sent to this object would throw an exception (the "revocation error") rather than triggering any behavior.

The notional benefit is that the revocation error could be pushed all the way out to the clients which hold a Presence for the object. Those vats could then handle message sends to the now-revoked Presence locally, without even talking to the kernel. Their liveslots layer would remember that the Presence had been revoked, and it wouldn't need to make a syscall to process a message send.

The hope would be that this allows the original Remotable to be dropped, and that somehow this would reduce memory use in some vats.

Description of the Design

The kernel object table would be enhanced: the "owner" field (which holds a VatID) would alternately hold the revocation error object (capdata). This would probably subsume the error we create when sending a message to a dead vat: instead of checking whether the owner vat-id is still alive, we'd just replace the owner field with a "vat is dead" error object in all remaining exports of the terminated vat.

We cannot delete the kernel object table entry merely because an object is revoked: these objects still have identity, and one vat might send an otherwise-dead object to some other, who might compare it against something, and this comparison must continue to work as it did when the objects were alive. This may limit the savings we might achieve.

On the Presence side, we'd probably replace the "handler" of the associated HandledPromise with one that always throws.

On the Remotable side, we currently only anticipate revoking virtual objects, because the explicit creation point is also the correct place for us to return the revocation facet. Normal (non-virtual) objects are usually created with a plain JavaScript object literal (usually in the form harden({ props... })), which means there's nothing special to distinguish the creator's access from any other client's access, and we certainly don't want to enable arbitrary clients to revoke an object they didn't create, even if that client is in the same vat as the creator (they hold a Remotable rather than a Presence).

But one direction worth exploring might be to require Remotable(obj) on everything that is allowed to cross the wire. If we did that, then we could maybe pass a second argument in, which could be given the revocation facet (Remotable(obj, f => revocationFacet = f), just like the executor in new Promise(executor)), or maybe we could pass a closely-held handle object in. A less palatable option might be to have Remotable() return something different than its argument: { r, revocationFacet } = Remotable(obj), where r is what you send over the wire, and obj would throw an error if sent over the wire.

Alternately, we could define an explicit Revocable creator function, and declare that normal Remotables (whether created by Remotable() or just plain hardened objects) are irrevocable, and the only way to make a revocable Remotable is with Revocable. In this case, the most straightforward signature would be const [ obj, revoke ] = Revocable({ properties.. }), to make it clear that the object literal containing the properties of the new object is not itself the callable object. Also, it should be clear that in:

const props = {
  foo(x) { stuff },
  bar(y) { stuff },
};
const [obj1, revoke1] = Revocable(props);
const [obj2, revoke2] = Revocable(props);

that having access to either props or obj1 or revoke1 or even obj2 gives you no ability to access revoke2.

Security Considerations

As mentioned above, we must be careful with the revocation authority: only the original creator of the object should get access (of course they can delegate it to whomever they want). Specifically, merely having access to the object must not automatically grant revocation authority.

Open Questions

Is this useful? Would it actually save any space?

Would the #2069 auxilliary data be dropped when the object is revoked? That would probably be surprising, but we might consider the potential data savings to be worth it.

warner commented 3 years ago

@erights and I were able to refine this a bit more:

To delete the c-list entries, we define a "Taboo" object as one which may not be spoken of. When a Remotable is revoked by its creator, eventually all other vats which held Presences for it will learn about the revocation, and that vref will become taboo for them: it will be an error for any outbound message to cite the vref. Such messages will be rejected by the liveslots layer before any syscall is emitted. Eventually, the kernel will forget about the kref as well.

We want to retain the one-to-one relationship between kref and Presence, without relying upon GC or WeakRefs within a vat. In particular, if vat A does bob~.foo(carol), but vat B already knows about carol, and vat C then revokes carol, vat B might receive foo first or it might receive notice of the revocation first. We need to make sure that B's original Presence for carol is === the argument of foo() in either case. If we used a WeakRef in slotToVal to manage this, the behavior would depend upon whether GC had happened or not, which cannot be part of the formal consensus (#1872).

I think we can accomplish this safely by splitting the c-list into its two directions: inbound (kref to vref) and outbound (vref to kref). The exporting vat, which revokes its Remotable, will do a syscall.revoke(vref). To accomodate other messages already in that vat's promise queue which might cite the same object, the kernel will hold off on acting upon the revoke until the end of the crank. At that point, the kernel removes the vref->kref mapping from the outbound c-list of the exporting vat, meaning that vat cannot speak about the object until/unless someone else re-introduces it. "Taboo" is really an "outgoing taboo": you can't speak about it, but other people might still speak about it to you.

Now, the kref might still be referenced by kernel data structures (run-queue, settled promise data) or by other c-lists. The kernel identifies all other vats which have the kref in their c-lists (the "subscribers") and queues a dispatch.notifyRevoked(kref) to them (the actual name is up for discussion). We want to queue this behind any messages sent before the revocation which might mention the kref. When this item reaches the top of the run-queue, the kernel uses the subscribing vat's inbound c-list to map it into notifyRevoked(vref), deletes the outbound (vref->kref) c-list entry, and delivers it into the vat as a new crank. At this point, the vref is taboo: it is a vat-fatal error to emit a syscall which cites the vref.

The subscribing vat's liveslots reacts to notifyRevoked by deleting the valToSlot table entry. Liveslots continues to hold the slotToVal reference, so the Presence remains alive, although vat code may or may not still hold a reference. If it does, and it uses it in an outbound message, liveslots will throw an error when m.serialize(args) cannot find the Presence in valToSlot. This will reject the one message that mentioned the taboo Presence, but won't kill the whole vat (user-level code must not be able to provoke liveslots into perfoming a vat-fatal action). In this state, the subscribing vat can no longer speak about the object until/unless someone else re-introduces it.

Once all notifyRevoked are delivered, now there are no more vats which can speak of the object (the kref will not appear in any outbound c-lists), however the kref will still be present in their inbound c-lists, and it might still be referenced by central kernel data structures: run-queue items, promise resolutions, and auxilliary data. So a vat might be re-introduced to the object in a later delivery. If this happens, the remaining slotToVal entry will map it to the same Presence as before, retaining the one-to-one property we want. The kernel should not re-add the outbound c-list entry.

Over time, these kernel data structures will hopefully drain and those references will go away. If/when the only remaining kref citations are in inbound c-lists, we now know that no one can ever hear of the object again: vats can't emit it, no pending messages reference it, no resolved promises reference it, therefore nobody will ever be able to receive a message that cites it. In this state, the object is fully unreferenceable, so the kernel object table entry can be safely removed, along with the inbound c-list entries. Those inbound entries are removed with another delivery, perhaps dispatch.forget(vref), which causes liveslots to drop the slotToVal table entry (and make the kernel delete the inbound c-list entry). This should remove the last liveslots reference to the Presence, allowing the Presence to be GCed (assuming the vat-level code has forgotten about it). The kernel will call dispatch.forget(vref) in the exporting vat as well, and when the export-side slotToVal is deleted, the original Remotable can be GCed.

If we implement the #1872 non-deterministic GC scheme, then dispatch.forget(vref) should remove the vref from the exports Set, which would be the sole liveslots-side source of a strong reference in the exporting vat.

The refcounting scheme needs to track the inbound and outbound c-lists separately. An outbound entry holds a reference on the kref, but not the inbound.

When a vat is terminated, all of its exports become taboo, just as if the vat had explicitly revoked them first.

Other ideas that came up:

We don't think we can drop auxilliary data when the object is revoked, however the fact that no vat can speak of the object again makes it more likely that the kernel object table entry can be deleted entirely sooner or later, and we can drop the auxilliary data at that point.

Programmers are responsible for revoking objects they create, to avoid accumulating garbage. But we may be able to tolerate some rate of unrevoked objects if it is low enough, so we might not impose this as a hard requirement. We can imagine some sort of testing harness that runs a contract in a loop and counts how many uncollected objects remain after each cycle (an automated version of @FUDCo 's benchmarking tool), and use the results as entry criteria for contracts on the chain, or at least as feedback to the authors (just like we'd use lint checks).

Object revocation introduces a slight DoS vector, but I'm not currently too worried about it. Imagine a situation where vat C sends a record to vat A, and vat A uses some pieces of it, then sends the whole record on to vat B. If C includes some unexpected properties in the record (which A doesn't look for), then C could revoke those properties later, and A spontaneously loses the ability to send that full record to B.

A is already relying upon C to give it the pieces that it needs: if C revokes one of those objects (or just omits them from the record), A cannot get their job done. So the only incremental vulnerability is that A might be able to do some sort of verification on the pieces that it does care about, and think they're safe, but then see their later message fail when it includes the whole record. A can mitigate this by only sending the pieces that it cares about (i.e. unpack C's record immediately, then drop the original).

warner commented 3 years ago

Hrm, one realization I had was that this gives vats a new way to sense whether a reference they hold is a Presence or a local object: they send it a message with a known-Taboo object (perhaps something they've revoked themselves). Local objects will accept it without complaint, but sending it off-vat would trigger an error during serialization. Of course .toString() already returns something like [Presence o-NN], so this is not a new ability, but if we wanted to close that off, we'd need to address both pathways.

To block this for the message-send case, we'd probably need to serialize the argument graph and examine its slots for Taboo presences, which sounds annoyingly expensive.

erights commented 3 years ago

This ability to sense is within our distributed object semantics and is not something to avoid. Doing an eventual send to a local object and doing an eventual send to a remote object are similar enough that for programs that stay away from edge cases programmers get away with treating them as mostly the same. However, when a message crosses a vat boundary it is marshalled. A proxy with proper pass-by-copy behavior is accepted and treated as pass-by-copy. But it can sense the things marshal is doing and infer that it is being marshalled.

The vat boundary is semantically significant. We are not trying to hide it from code that wants to see it.

Chris-Hibbert commented 3 years ago

Which variants of these designs require that objects use the mechanism in order to be shared, and which variants don't? Our platform already has enough learning costs that I think it would be important to choose one where developers didn't have to add these declarations until they have made enough progress to be concerned about performance. Relative to that consideration, I think the cost of adding these declarations and making use of them within Zoe itself isn't a big deal.

warner commented 3 years ago

Let's see, we've got one main distinction: revocable or irrevocable. To be revocable, we need to provide a revocation facet at construction time (at least I haven't been able to think of any other way to represent the revocation authority, since we don't have a lexical way to demonstrate that some particular piece of code is "inside" the object). Which means we need a function that returns both the remotely-sharable object and the revocation facet, something like the example above (for non-virtual objects):

const props = {
  foo(x) { stuff },
  bar(y) { stuff },
};
const [obj1, revoke1] = Revocable(props);
const [obj2, revoke2] = Revocable(props);

For virtual objects, we can make all of them revocable, since we need a construction function anyways (the return value from makeKind()); we just change its signature to return [obj1, revoke1] or { obj, revoke } or something.

For irrevocable objects, we can continue to use object literals.

But it kind of raises the question: when should we not use revocation? @erights pointed out that we don't have to require everything to be tightly revoked; we might be able to accomodate some overhead (uncollected objects) as long as it isn't too much. But I'm inclined to think that marking objects as revocable, and actually revoking them when they're no longer useful, should be part of the best-practices.

warner commented 3 years ago

Dean had a counter-proposal, which I think is roughly: stop making the guarantee that Presences are unique, at least not once their upstream Remotable has been revoked. I'm still trying to figure out the details.

warner commented 3 years ago

Ok, so @dtribble 's counter-proposal is:

Then, if/when the Remotable is revoked:

This requires us to define a pass-by-copy "dead object record", which contains merely the error capdata with which any messages should be rejected. The idea is that any message that references the now-revoked Presence will be delivered successfully (they are not "taboo"), however the Presence arrives as a dead object instead of a new Presence.

I'm not yet sure how to represent these. I really don't want the kernel to parse or modify the capdata.body string: at present the kernel merely translates the capdata.slots krefs into vrefs and vice versa. For this proposal, the kref-to-vref translation might need to emit a dead object record instead of a vref, which means changing our vref type into something which can contain composite objects, which I'm also loathe to do, however I don't yet see a way around it. The .body would remain unchanged, but .slots[index] = { error: { body, slots } } instead of .slots[index] = 'o-12'.

One additional feature Dean suggested was that many objects have contagious failures, and the order of successful operations is important (think of a writable file object: if the third write() message fails, you don't want to apply the fourth and fifth, because then you'd just wind up with a corrupted file). So you might want to mark a method as being critical: if it throws (or rejects), then the object should automatically be marked as revoked, preventing any subsequent messages from being delivered. We might do this with a special child property on the method Function object, or with a list of strings provided as an option when the Remotable is built.

We're also thinking that a short-term priority should be to change the way Remotables are declared:

A useful experiment to run would be to instrument our tree to sense when/where we depend upon Presence identity: patch Map/WeakMap to notice when a Presence is used as a key. Ideally we'd also sense when someone does === on a Presence. JS doesn't make that possible, but we could consider appying a source-to-source transform of all x === y expressions to add an additional isPresence(x); isPresence(y); check or something.

FUDCo commented 8 months ago

From Slack: @warner: On Friday Mark and I were working on trying to get his exo revocation logic to work with virtual objects and bumped into an issue with how to record the revocation status in persistent storage. I’ve thought of a number of different solutions, all of which are somewhat annoyingly bad, but each one is bad in its own way, so we wanted to bounce the question off you to get your take on it.

The short summary of the problem is that we need to record boolean flags; one flag in the case of non-faceted objects and an array of flags in the case of a faceted object. This felt very analogous to what we do with the export status, except that it pertains to the state of the object itself rather than to its relationship with the rest of the world. Currently the state of the object is kept in the vatstore under the key vom.${baseRef}; what’s stored there is a JSON-encoded object with one property per state variable where the value is represented by a string containing the serialized, marshaled state of the corresponding state variable. Options for storing revocation status (note, also, that we don’t need to actually store that status until at least one facet is revoked):

Option 1: store under a new key in the vatstore, say vom.revoke.${baseRef} that’s, say, a string containing, say, an array of, say ‘l’ or ‘r’ (for live or revoked; the actual representation there is not important to this question). Pro: very straightforward, Con: an extra vatstore read each time we load the state of the VO.

Option 2: change the spec of the vom.${baseref} value to be something like { revocation: STR, state: { AS BEFORE ONLY NESTED ONE LAYER DEEPER }}. Pro: only need one read. Con: backwards compatibility issues.

Option 3: store the revocation status in the existing state object under a pseudo-property name, and censor this from the visible state that’s presented to user code after reading. Pro: still only one read, and finesses the backwards compatibility issue. Con: embedding metadata in the data is icky and fraught with peril, though possibly we could use a property name containing a character that would not be allowed in an actual state variable, to preclude the possibility of collision, or alternatively do some kind of Hilbert hotel trick.

Option 4: give up on the idea of this revocation thing for virtual exos. Pro: super easy. Con: it would make Mark sad.

Personally, I’d go with option 3, but we’re curious as to your take on the matter.

warner commented 8 months ago

(also from slack, in a thread that's talking about adding revocability into virtual/durable objects, rather than ephemeral Remotables, which it what this ticket is about, but it's sufficiently related to warrant capturing some ideas)

Yeah, I don't like the extra storage/key costs of 1, or the extra RAM/storage costs of 2, or the compatibility/security problems of 3. My inclination is kind of like option 2, but combining the format change with the changes needed to handle virtual-object schema migration. Take a look at https://github.com/Agoric/agoric-sdk/issues/7407#issuecomment-1664852489 and the discussion around it. We'd start the vom.${baseref} value to start with [ or a digit or something other than a { curly, and use the extra space thus carved out to both record revocation flags and current schema version.