Strawman ideas for vat secondary storage

Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform

Apache License 2.0

326 stars 206 forks source link

Strawman ideas for vat secondary storage #1846

Closed FUDCo closed 3 years ago

FUDCo commented 3 years ago

Thoughts on externalizable objects and secondary storage for vats

`LiveSlotsStore`

LiveSlotsStore is the low level store, used by liveslots but not exposed directly to vat code. In principle it could be implemented on top of the same store that the kernel db uses now, merely adding some conventions about prefixing keys (with the vat ID and some constant) to avoid collisions. However, it might also be helpful to put each vat's data in a separate store to allow vats to be moved around, e.g., if a worker is run on a different machine from the kernel.

interface LiveSlotsStore {
  get(key: EOSlot) => EOCapData;
  set(key: EOSlot, value: EOCapData) => void;
  delete(key: EOSlot) => void;
}

`EOSlot`

EOSlot is just a pattern for identifying an externalizable object, in the same manner as we do for kernel or vat objects/promises.

type EOSlot = 'eo$NN'; // yes, I know this isn't a real TypeScript type

`EOCapData`

EOCapData represents an externalizable object in its serialized form. It's like CapData, but adds a type field so that data can be reunited with its behavior when deserialized from the database.

interface EOCapData {
  body: string;    // JSON encoded object
  slots: string[]; // vo$NN, vp$NN, eo$NN
  type: string;    // et$NN
}

`ExternalizableObject`

ExternalizableObject is an object that can be stored in the per-vat store. It consists of two parts: a hunk of mutable data (which must be serializeable using our marshaling framework) and a deeply immutable collection of methods that realize its behavior. The latter may be shared by multiple externalizable objects, making it a kind of type or class. We require the code to be immutable both so that it can be shared in this way and so that its meaning cannot change over time (e.g., it can't close over any mutable state, which could allow it to sense if it's been reloaded from secondary storage). The idea here is that the methods are the only things (aside from the runtime) that have direct access to the state. [Q: though there would seem to be a signifcant role for pure data objects with no methods, which would require them to have public state if they're to be useful; perhaps this could be realized by some conventional type marker, like null or undefined?]

What I'd like to do is something like this:

function makeExternalizableObject(methods: object) {
  let state: Data;

  function init(data: Data) { state = data; }
  function get() { return state; }

  return harden([init, get, { ...methods }]);
}

This won't work, of course, since the passed-in methods carry their lexical scope with them. I don't believe there's any graceful way to inject a bundle of methods into a different lexical scope. Possibly there's some kind of games we can play with Function.bind or prototypal inheritence (e.g., set the prototype of the state object to be the methods object) which would be a little weird for us but might work -- this is a question to call in the real JS gurus on.

We also need some way to validate that the implementation doesn't close over any other variables, which requires some kind of parsing and analysis, but I'm assuming we have tools in our toolbox for that which could hardly be more involved that what we do for metering. He said.

The type Data denotes arbitrary serializable data. [Q: Do we have a name for that abstraction? It's not CapData, which is already serialized... Also, can it be expressed as a TypeScript type?]

It would be nice to have a clean way to detect if any values reachable from state have been modified, so we can know if the state is dirty when we're deciding about writing it to disk. Proxy?

`VatStore`

VatStore is the high level store, the thing that's made available to vat code. It's implemented on top of LiveSlotsStore but hides the mechanics of serialization and deserialization. The vat is given a single VatStore as one of its vat powers; this store can in turn be used to create object containers for storing externalizable objects in.

interface VatStore {
  makeExternalizableObject: (methods: object) => ExternalizableObject;
  makeContainer: () => VatStoreContainer;
}

VatStoreContainer allows objects to actually be stored and retrieved.

interface VatStoreContainer {
  enroll: (obj: ExternalizableObject) => EOPresence;
  makeIndex: (mapper: KeyMapper, comparator: KeyComparator) => EOIndex;
}

The idea here is that once an object has been associated with a container (via the enroll method), it can be automatically saved and fetched by a virtual-memory like mechanism to be described shortly.

The VatStore maintains table indexing external object type identifier strings (et$NN) to the method objects passed to makeExternalizableObject. Each unique method object gets its own type ID, but objects with the same method object get the same type ID. This is used to reassociate an object's behavior with its state when the latter is read from the database and instantiated in memory.

`EOPresence`

EOPresence enables you to interact with the object.

An EOPresence encapsulates an EOSlot (an eo$NN slot string) and possibly a pointer to an instantiated ExternalizableObject in memory. We define an operator that I'm calling O here (we can bikeshed this later), analagous to our existing E and D operators. You invoke a method on an externalizable object with something like:

O(eopresence).method(args...)

which looks at what's encapsulated: if there's an instantiated ExternalizableObject in memory, it synchronously invokes the method on the object and returns the result it produced or throws any exception that it throws. If the object is not in memory, it uses the encapsulate EOSlot to fetches the correspnoding EOCapData from the LIveSlotsStore and uses this to instantiate an ExternalizableObject, which it memoizes in the EOPresence, and then sends invokes the method. It is up to the liveslots layer to manage a cache of instantiated objects, and to automatically commit any modified objects to the container at the end of each crank.

`EOIndex`

EOIndex is an indexed view onto the contents of a VatStoreContainer, generated by call the container's makeIndex method. The caller provides two functions that define the operation of the index, a key mapper and a key comparator:

type KeyMapper = (obj: EOPresence) => Key; type Key = string;

A KeyMapper provides an application-domain translation between the object's internal state and a key to be used for locating it in the index being created. The key mapper will be called whenever an object is inserted into the index and whenever a modified version of that object is written to the container. The produced key values are strings, but no further semantics is imposed by the indexing mechanism itself.

type KeyComparator = (k1: Key, k2: Key) => number;

A KeyComparator is used by the indexing mechanism to determine the ordering of keys according to application-domain-specific logic. It returns the usual less than zero, zero, greater than zero indicator used by comparator functions the world over.

interface EOIndex {
  index: (obj: EOPresence) => void;
  lookup(query: Query) => EOPresence[];
}

The index method instructs the index to keep an index for the indicated object.

The lookup method performs a query and returns an array of the objects that match.

A strawman query scheme:

type Query = AndQuery | OrQuery | NotQuery | ValueQuery;

type AndQuery = ['and', Query[]];
type OrQuery = ['or', Query[]];
type NotQuery = ['not', Query];

type ValueQuery = [Comparison, Key, Key];
type Comparison = 'le' | 'lt' | 'eq' | 'ne' | 'ge' | 'gt';

FUDCo commented 3 years ago

@warner

warner commented 3 years ago

@FUDCo and I just got off a long phonecall to discuss this, unfortunately before I had a chance to read the text above, so I received a mix of those ideas and ones that we came up with during the call. I'm going to write down my thoughts first, before I forget them, and then go back and read that text and see what I got wrong.

The form of his approach that we settled on uses a mutable state object rather than the explicit get() and update() methods from @dtribble 's approach (in https://github.com/Agoric/agoric-sdk/issues/455#issuecomment-665973949). This state object is actually a Proxy so we can sense when it is being used and page in the state from disk on-demand. We keep enough intermediate objects around so that we can reliably write the state back to disk before liveslots forgets about it, even though userspace code might hang on to some other object beyond that point.

The syscalls would be the same as described in #1831, although there was some discussion about whether these should be non-syscall functions that maintain the same atomicity/transaction boundaries as the rest of the vat's state. Using syscalls is an easy way to start out, for sure: everything goes through the crank buffer so it automatically gets committed/reverted in the same way as all other vat state. But as we move vats out into child processes, syscalls that must traverse the pipe back to the kernel might feel too expensive, and we might want to route them to a local DB (encouraged by the fact that this is per-vat secondary storage, and never needs to be accessed by more than one worker process at a time; whichever worker is hosting the vat at that moment). The requirement, though, is that one crank's changes to this local DB are committed atomically along with changes to the kernelDB, which is more difficult if they are in two separate databases. There are tricks to accomplish that (recording a generation number along with each change in the vat DB, committing the generation number into the kernelDB when the crank is committed, then on reload you discard any vatDB changes whose generation number is higher than what you find in the kernelDB), but I don't really want to engineer that stuff yet.

In this system, liveslots would maintain a fixed-size list of (strong references to) LiveState objects, indexed by the virtual object ID. This will stick around until the creation of a new one knocks one off the list. There is strictly at most one LiveState object for any given virtual object ID.

On top of that, we have some number of ephemeral state objects (I don't know what to call them, let's use EphState for now), which hold a reference to the code that manages this list, as well as the virtual object ID. There may be an arbitrary number of EphState objects for a given virtual object ID: we can tolerate mutliple aliases. Then on top of that we have the same number of ephemeral behavior objects, which have methods as defined by the user's constructor function. We can tolerate multiple behavior objects per ID too. The user's constructor function creates a singleton object (one per Container type) with methods like getBalance(state) and deposit(state, balance, otherPurse). This one object will be invoked with a different state argument for each user-facing Representative. The state argument will be one of the EphState objects.

Liveslots creates a new pair of (ephemeral behavior object, ephemeral state object) each time deserialization encounters a virtual object ID. We (liveslots) have no way to tell when these go away, but we must maintain a WeakMap from the behavior object to the object ID so they can be serialized properly.

The ephemeral behavior object has methods like getBalance() and deposit(balance, otherPurse), and closes over an EphState object. When invoked, those methods delegate to the singleton user-constructor-defined object, inserting EphState as the state argument, followed by the rest of the user-provided arguments (like otherPurse).

The user-constructor-defined object uses return state.balance and state.balance += amount to modify the virtual object's state. EphState is a Proxy, and when it senses gets and sets, the first thing it does is to call into the code that manages the table of live State objects and asks it to return-or-create a LiveState object for the given virtual object ID. Then it calls livestate.get(name) or livestate.set(name, value). The LiveState object might read all properties from disk (with syscalls) immediately upon creation, or it might lazy-load single properties when get() is called. It might write properties to disk immediately when set() is called, or it might wait until the LiveState object is about to be evicted from the table (to make room for a new one). Our choice will be guided by performance measurements and usage patterns (high churn within a single crank would benefit from writing late, large rarely-used values would benefit from reading late).

The RAM usage is driven by the number of behavior/EphState objects held by user code (hopefully for short-lived operations), and the size of the LiveState table. The number of syscalls is driven by the size of the table (therefore the lifetime of a LiveState object) and the churn on its data.

In this model, user code (the constructor function) is written to use a simple state object. a getBalance for @dtribble's example from #455 would probably look like:

const purseConstructor =  {
  getBalance(state) {
    return state.balance;
  },
  burn(state) {
    state.balance = 0;
  },
};

const c = liveslots.createContainer(purseConstructor);

function mint(initialBalance) {
  return c.create(initialBalance);
}

(I haven't thought about how deposit() should correctly/safely get authority to mutate otherPurse, which might sink this approach, nor have I figured out the initial state)

The benefit of using state.balance = is familiarity: developers can pretend that state is a real and simple object. They can even write functions which close over it and work correctly after the original method has returned. The drawback is concealed magic: state cannot be serialized correctly (it isn't hardened, and as a Proxy whose target is unfrozen it probably cannot be hardened, and has no methods, so it cannot be pass-by-reference; pass-by-copy would become stale quickly). Plus the rights-amplification/synthesis question.

The LiveState get/set methods must perform the same kind of serialization that liveslots does for message sends. Liveslots must recognize all behavior objects and serialize them into the right virtual object ID.

If user code hangs on to the behavior object for a while, that will keep the EphState object alive, but the LiveState object might fall out of the table. We rely upon theEphState's accessor to sense when it gets used again and page the data back into memory. There are other approaches which might be cheaper but probably aren't reliable:

wrap the behavior object and refresh the LiveState object when each method is invoked: fails if the behavior method returns a new object with methods that close over state, would require a Membrane to be safe. The membrane would have to watch out for state being returned directly.
maybe flush the state at the end of crank, but 1: vats (including liveslots) don't get to learn when that happens, and 2: I think aliases could cause corruption

We might also use the LiveState table to remember a "usual" EphState/behavior object for each currently-live virtual object ID, which would increase the chances that user code would get the same behavior object in multiple calls, but I think that might be a bad idea, and it might be best (for discoverability/education-through-failure) to try as hard as possible to do the opposite. If Object.is(oldBehaviorObject, newBehaviorObject) "usually" works, you'll be surprised when it fails. If it usually fails, you'll probably avoid it.

I think we have a range of potential approaches (this state approach, Dean's #455 approaches, probably some in between), and they'll vary along axes like:

how far do they deviate from @erights 's WeakMap-based ideal (readability)
how many hoops must developers go through to use (return state.balance seems easier than return c.get(me, 'balance'))
how many safe-looking operations would in fact cause problems? Using the behavior object (the Representative) as a key in a table is one, since liveslots doesn't (generally) provide the same object twice. In some approaches, adding properties to an ephemeral object might be possible, but highly buggy.
performance in the face of churn: do we amortize DB writes, or defer DB reads until absolutely necessary?
do the extra layers of Proxies cause performance or complexity/reliability problems?

erights commented 3 years ago

Jessie doesn't have mutable properties. In our style we express all mutation be either

assigning to lexical variables
mutating builtin mutable data types like maps.

With only fixed properties, there's no need for proxies. Whatever you need to do to mark dirtyness, it wouldn't be by mutating an unknown property, and thus you'd need that same logic whether there's a proxy in the way or not.

For ERTP, all mutation is only in the shared ledger stores --- shared among instances which jointly encapsulate it. For Zoe, most mutations are in shared stores. Some are in lexical variables, and these would need to be transformed to track dirtyness. (See @michaelfig 's https://github.com/Agoric/agoric-sdk/pull/1843/files/989030259c768410e9bb662058986e799fb4ec72..d0dc6ee6fc022673decb7b64095de458834d3378 ). None are even in accessor properties that give the appearance of assignable properties. We don't do that. That's a good thing. API surface is about requests. State representation is internal to the abstraction.

FUDCo commented 3 years ago

I agree that state representation is internal to the abstraction, but that internal state is the state we need to store. I don't see how Jessie can be made to work in this context without modifying the JS-engine or doing some fairly complex form of code transformation, neither of which seem plausible to me in the near or moderately near term. Mutable state has to live somewhere, and while my preferred answer would also be lexical variables, I'm not aware of anything in stock JS that would enable an automatable mechanism to capture their values for transfer to secondary storage.

FUDCo commented 3 years ago

@erights @michaelfig I was recapping our discussion this morning to Brian and he harkened back to the July discussion with Dean that's summarized in https://github.com/Agoric/agoric-sdk/issues/455#issuecomment-665973949 linked above, which, in particular yielded the following snippet which I'll just copy here for convenience:

const purse = (me, c) => ({
    deposit(other) {
      const myBalance = c.get(me);
      const otherBalance = c.get(other);
      c.update(me, myBalance + otherBalance);
      c.update(other, 0);
    }
});

in this case, c is the "container" (or whatever we ultimately end up labelling it) and me and other are references to purses. Aside from the ergonomic concern that this proposed API requires one to manually do the fetches and assignments of state via the container (which Michael's code generation pattern tries to make transparent), there's a rights amplification thing going on here: One purse is reaching into the other, which is allowed because we hold the c capability. I don't understand how the closed-over-variables approach can realize this rights amplification pattern, since no matter which purse you are the other purse's state is not in scope for you. So now I'm confused.

michaelfig commented 3 years ago

That's not a problem:

let purseToBalance;
const { make: makePurse, makeWeakStore: makePurseWeakStore } = makeCollection(() => {
  const me = {
    deposit(other) {
      const myBalance = purseToBalance.get(me);
      const otherBalance = purseToBalance.get(other);
      purseToBalance.set(me, myBalance + otherBalance);
      purseToBalance.set(other, 0);
    },
  };
  purseToBalance.init(me, 0);
  return me;
}));
purseToBalance = makePurseWeakStore();

The key insight from the discussion with @erights is that the "collection" is not the same as the "weakStore" that uses objects from the collection as keys. You have to make such stores explicitly in this model.

michaelfig commented 3 years ago

I haven't update my example code to implement that, but I'm working on @agoric/collection.

FWIW, here is the proposed rewrite of the above code.

let purseToBalance;
const { make: makePurse, makeWeakStore: makePurseWeakStore } = makeVatCollection(() => ({}), $hinit => $hdata => {
  const me = {
    deposit(other) {
      const myBalance = purseToBalance.get(me);
      const otherBalance = purseToBalance.get(other);
      purseToBalance.set(me, myBalance + otherBalance);
      purseToBalance.set(other, 0);
    },
  };
  $hinit && purseToBalance.init(me, 0);
  return me;
}));
purseToBalance = makePurseWeakStore();

FUDCo commented 3 years ago

In this example the balance is not kept as part of the purse's closed over state, it's stored external to the purse in the weakStore. This solves the rights amplification problem at the cost of separating an object's state into two parts: one part that is private to the object (which the above example has none of and thus doesn't benefit from the code transformation) and a second part that is potentially shareable amongst others of its ilk that needs some kind of additional, separate storage and get/store mechanism that I don't think we've discussed. I can't tell if I'm still confused or if I just don't care for this design.

michaelfig commented 3 years ago

Internal state is only one reason for the code transform. The other reason is to separate initialization from reconstitution, which is a critical separation to make the system work.

We already have the separation of private vs. public data in our source code. Developers are familiar with it.

erights commented 3 years ago

In this example the balance is not kept as part of the purse's closed over state, it's stored external to the purse in the weakStore.

The current Jessie implementation of the purse ledger system does exactly the same thing for the same reason. One purse needs to be able to amplify another purse into access to the other purse's balance. It's what we've come to call "class private per-instance state". Closure-captured variables and Smalltalk instance variables are both "instance private per-instance state" because only the instance can see its own state. Java private instance fields and JavaScript private fields are "class private per-instance state" because it is private to code in the class, but one purse can see into another's class-private state. The purse ledger system uses weakStore to get the same effect for object-as-closure objects, where the weakStore as a whole is held in a shared closure-captured variable.

This solves the rights amplification problem at the cost of separating an object's state into two parts: one part that is private to the object (which the above example has none of and thus doesn't benefit from the code transformation) and a second part that is potentially shareable amongst others of its ilk that needs some kind of additional, separate storage and get/store mechanism that I don't think we've discussed.

As @michaelfig said, we did effectively discuss it in making the distinction between the "collection" as (large)hidden backing store vs (large)weakStore as explicit ocap-based collection abstraction exposed to the application programmer. Both of these are encapsulated by the same objects. But they're encapsulated at different levels of abstraction.

I can't tell if I'm still confused or if I just don't care for this design.

"don't care for" sounds like a judgement. Even with your manual code, how do you solve it? You can't use c for both purposes. You need a distinction.

warner commented 3 years ago

As @FUDCo and I were noodling about this today, I remembered that @dtribble 's example is incomplete: it uses a simplified Purse model that we stopped using maybe a year ago (it lacks Payment objects). What we really need is two tables, and a method like purse.deposit() needs the authority to manipulate both the Purse's balance and also the Payment's balance.

So we need a way to reify this "allowed to manipulate state" authority into something that can be closed over by both the Purse methods and the Payment methods. In the WeakMap expression, this is exactly the WeakMap object, from which the deposit method can convert a single Payment marker object into the state bundle ready to be decremented. I think this is a particular application of the Sealer/Unsealer pattern.

I think @michaelfig 's example uses the return value from makeVatCollection().makeWeakStore() for this purpose, but I'm not sure how much of that code is something the author would write, vs something generated by a transform function.

The approach that I sketched out with Chip might be equivalent to Michael's, but I'll write it here just so I don't forget:

function makeMint() {
  const { open: openPurse, create: createPurse } = liveslots.createContainer();
  const { open: openPayment, create: createPayment } = liveslots.createContainer();
  function mint(initialBalance) {
    const purseBehavior = {
      deposit(me, other) {
        openPurse(me).balance += openPayment(other).balance;
        openPayment(other).balance = 0;
      },
    };
    return createPurse(purseBehavior);
  }
  return mint;
}

(this is incomplete, of course).

The idea was that the open function is what lets the behavior code amplify access to the public facet of a Purse or Payment into access to the private (mutable) state. The liveslots code would implement these open functions by making syscalls to retrieve the state from secondary storage. The actual invocation might be openPayment(other).balance = 0 or openPayment(other).set('balance', 0) or openPayment.set(other, 0) or whatever, some forms requiring a Proxy and others being more explicit+verbose.

In @dtribble 's original writeup, the behavior was defined early, and passed into the createContainer() call, necessitating the "amplifier" (i.e. c, which is acting as an Unsealer) to be delivered as an argument to the behavior function. This makes it awkward to get multiple Unsealers, which is necessary to build the Purse+Payment version.

In mine, open is the Unsealer, and you get it back from createContainer, which means you can't easily create the behavior before that point, which means you have to pass the behavior object back in, during a second step. I do that in the create() (instantiate) call, and my thought was that it should retain an internal map which assigns an integer to each new behavior object, which now serves as the type indicator. The common case would be to only ever have one type, and we need the map to make sure we aren't retaining a zillion behavior objects (which would defeat the purpose of pushing the state out to disk). We could enforce one-type-per-container by instead having some sort of register() function that must be called after createContainer() but before anything can create a new instance.

In @michaelfig 's code, I think the return value is the Unsealer, but it uses a mutable let binding to make it available to behavior code that was defined earlier and passed into createContainer. That's a tiny bit contorted, but I think I prefer it because it enforces the one-type-per-container property.

erights commented 3 years ago

I think I prefer it because it enforces the one-type-per-container property.

Good!

It also makes the distinction between "collection" backing storage for implementing a huge number of instances vs the weakStore for giving them access to each other's class-private state, and only that state.

michaelfig commented 3 years ago

I'm not sure how much of that code is something the author would write, vs something generated by a transform function.

My "not a problem" example is intended to be all hand-written. The "FWIW" example is what the hand-written example would be automatically rewritten into (preserving line numbers), by a transform in SwingSet automatically detecting the use of import { makeCollection } from '@agoric/collection'; and replacing it with a makeVatCollection endowment present in the vat's globals.

Names will be bikeshodden.

erights commented 3 years ago

Here, most of the code is exactly the same. This will often be the case, but will often not be the case. It is the case here because none of the per-instance state was in captured lexical variables. If it were, the rewrite would still be pleasant, readably related to the original, and line number preserving. But variable accesses would turn into similar looking property access.

FUDCo commented 3 years ago

Brian's example is illuminating because it makes clear that access comes not from being of a type but from having the capability for access -- in this case an object of type A is reaching "into" an object of type B. In particular, it goes beyond the distinction that MarkM usefully labelled above as "class private per-instance state" vs. "instance private per-instance state", into something else more general where you can see the need for the capability to be explicitly wielded (actually this seems like a nice reification of another of MarkM's long standing notions, the systems-of-status vs. systems-of-property idea).

warner commented 3 years ago

Notes from today's meeting, where we walked through @michaelfig 's example in #1857 :

our current proposal is to put three functions into each vat's global environment:
- 1: something like makeExternalStore, aka makeContainer, which accepts a description of the behavior of a Representative, and returns a makeInstance function to create new virtual objects (returning a Representative for it)
- makeInstance is the only way to create a new virtual object (allocating a new hierarchical identifier). However liveslots will build a new Representative (which knows that identifier internally) any time the ID is cited in the target or arguments of an inbound message, or in the unmarshalled contents of a weak-store get operation
- 2: a predicate that returns true if the two arguments are both Representatives which represent the same virtual object, else false
- 3: something like makeWeakStore which returns an object that behaves a lot like a WeakMap (non-enumerable, supports rights-amplification synergy patterns), but 1: the keys must be Representatives (of any type/behavior), 2: keys are equivalence classes defined by the predicate above, 3: the values can be any marshallable data (including presences and other representatives), 4: everything is stored on disk, so having lots of keys doesn't consume a lot of RAM
- so any two Representatives for the same virtual object will provide access to the same data
- the get and set methods on this store are the synchronization point for multiple clients of the (store + virtual object) pair: the value returned by store.get(representative) may change at any time when other code (with access to the same store and an equivalent Representative) gets a chance to run. Code that wants to do a read-modify-write must do so without allowing other code to get control.
- if get and set are the only way to store per-virtual-object state, we do not need Proxies or code which identifies which property names need to be provided with accessors
- note that accepting any kind of Representative is a change from the #1857 API, which returned makeWeakStore as an API of the makeExternalStore (next to makeInstance), implying it was specialized for that one store (and Representative type). Making it a global and non-type-specific means you can create purseLedger and paymentLedger at the top of the file, just like a regular WeakMap, which makes it easier to write the Purse behavior functions before the invocation of makeExternalStore, rather than needing to nest it afterwards, which should clean things up a bit.
- this table is "weak" in the sense that is not enumerable (so more like WeakMap than Map). However the table entry is specifically not dropped merely when all local Representatives are dropped: it can be kept alive by other vats holding a Presence to the virtual object, despite the local vat's RAM having forgotten about it. In this dormant state, the only local reference to the virtual object lies in the secondary storage. Note that we do not currently have a decref operation to let us sense when those other vats have dropped their references, so this table cannot shrink until we add that GC facility.
we deliver these to vat code as globals because:
- we don't consider them to be powerful: secondary storage is like RAM, available to everybody
- it would be too painful to thread them through from vatPowers down into whatever ERTP library needs them
liveslots and the vatManager must collude in nontrivial ways to get them delivered as globals: liveslots must know how to marshal Representatives (and marshal the data stored in the makeWeakStore tables), therefore liveslots must be involved in the creation of these three functions. But only the vatManager is in a position to provide globals to all vat code, and the normal construction sequence is: build globals, evaluate vat code, extract buildRootObject from the exports, call makeLiveslots(). There is a cycle here, and we have to find some clever way around it.

I'd like to see how it feels to retain instance state using only the makeWeakStore table, I think it'd be simpler to have just one mechanism (instead of also having the non-rights-amplifying $hdata approach).

Once we define these three functions, the next step is to rewrite @michaelfig 's ERTP example with it. We'd also discussed how Remoteable and the interface definition should be applied/glue to the resulting Representatives: we'd guessing that makeExternalStore should accept an interface name at the same time it accepts the behavior, so that every Representive it creates is automatically registered to pass-by-presence with the given Interface.

We're thinking that @agoric/store, when run in a vat environment, should automatically provide the Representative style (although using it outside a vat environment should continue to work, using the normal Map object, because @agoric/store is used outside of vat code). It would be error-prone to provide both a Representative-aware and a non-aware store facility in the same program.

We're not overly concerned about people mistakingly taking a Representative and using it as the key in a WeakMap (which doesn't know about the equivalence classes, nor keeps its data on disk), when they ought to have used a WeakStore (whatever we end up naming it). They could make the same mistake with pass-by-copy records. Our plan is to aggressively use different Representative objects (of the same equivalence class) to help surface these errors early, although @dtribble has concerns about the performance consequences. So we may need to experiment with the API we provide to see if these sorts of mistakes are easy/hard to make, whether we might catch them some other way (linting?), and whether being defensive is worth the potential performance cost (mostly memory churn).

All of this is predicated on the scheme described up in the earlier comment, in which we keep a finite number of "data holder objects", they're dirtied by WeakStore .set() calls, and we flush them to disk just before evicting them from the list. If we aren't using the $hdata -style Proxy, then maybe this dirty/writeback scheme won't be so important, and we just write the data to disk as soon as someone calls .set. This might be a simplicity-vs-performance tradeoff, but I think it will depend upon how many small+frequent writes we make. For example, a contract which uses a bunch of short-lived Purses but retires all of them within a single crank might benefit from batching the writes somehow.

warner commented 3 years ago

Here's one approach to the Issuer I wrote based upon the notes above, using weak-stores exclusively for the state management, rather than an additional non-rights-amplifying instance store object. I think we could make this code work with a secondary-storage -based makeExternalStore and makeWeakStore. This code does not need translation, although conceivably we could write something more ergonomic and translate it into this:

/* global makeExternalStore, sameRepresentative, makeWeakStore */
// these three come from (and share private tables with) liveslots

import { Remotable } from '@agoric/marshal';
import { makeInterface, ERTPKind } from './interfaces';

export function makeIssuerKit(allegedName) {
  const brand = Remotable(makeInterface(allegedName, ERTPKind.BRAND));

  const purseBalanceLedger = makeWeakStore(); // value: balance
  const purseDepositLedger = makeWeakStore(); // value: deposit
  const paymentLedger = makeWeakStore(); // value: balance
  const depositLedger = makeWeakStore(); // value: Purse

  const purseInterface = makeInterface(allegedName, ERTPKind.PURSE);
  const paymentInterface = makeInterface(allegedName, ERTPKind.PAYMENT);
  const depositInterface = makeInterface(allegedName, ERTPKind.DEPOSIT_FACET);

  function buildPurse() {
    const purse = harden({
      deposit(srcPayment) {
        const purseBalance = purse.getCurrentAmount();
        const srcPaymentBalance = paymentLedger.get(srcPayment);
        const newPurseBalance = purseBalance + srcPaymentBalance;
        paymentLedger.delete(srcPayment);
        purseBalanceLedger.set(purse, newPurseBalance);
        return srcPaymentBalance;
      },
      withdraw(amount) {
        const purseBalance = purse.getCurrentAmount();
        const newPurseBalance = purseBalance - amount;
        const payment = makePayment();
        purseBalanceLedger.set(purse, newPurseBalance);
        paymentLedger.init(payment, amount);
        return payment;
      },
      getCurrentAmount() {
        return purseBalanceLedger.get(purse);
      },
      getAllegedBrand() {
        return brand;
      },
      getDepositFacet() {
        return purseDepositLedger.get(purse);
      },
    });
    // note: we cannot use 'purse' in a weak store yet, because it is not yet
    // associated with a virtual object
    return purse;
  }

  function initializePurse(purse, initialBalance = 0) {
    purseBalanceLedger.init(purse, initialBalance);
    const depositFacet = makeDeposit(purse);
    purseDepositLedger.init(depositFacet);
  }
  const makePurse = makeExternalStore('purse', purseInterface, buildPurse, initializePurse);

  function buildPayment() { // simpler because it doesn't reference self
    return harden({
      getAllegedBrand: () => brand,
    });
  }

  function initializePayment(payment, amount) {
    paymentLedger.init(payment, amount);
  }
  const makePayment = makeExternalStore('payment', paymentInterface, buildPayment, initializePayment);

  function buildDeposit() {
    const deposit = harden({
      receive(payment) {
        const purse = depositLedger.get(deposit);
        return purse.deposit(payment);
      },
    });
    return deposit;
  }

  function initializeDeposit(deposit, purse) {
    depositLedger.init(deposit, purse);
  }
  const makeDeposit = makeExternalStore('deposit', depositInterface, buildDeposit, initializeDeposit);

  return harden({ makePurse });
}
harden(makeIssuerKit);

The reason I had to split up buildPurse from initializePurse is that the latter wants to use purse as a Representative, specifically it wants to use purse as the key in purseBalanceLedger and purseDepositLedger. But to do that, we need to know which virtual object ID this purse is a representative of. User-level code like buildPurse doesn't get to learn that (or control it), so some lower-level code which does know the ID must add the purse object to a weakmap first. To break the cycle, we need two separate user-level functions: one that creates an object (a Representative) with some user-defined behavior, then a second one which can use that object as a key and set up the private state. In between, some lower-level code will assign the right ID to enable the weak-store uses.

So internally, the external store does something like:

const representatives = new WeakMap();

function makePurse(...args) {
  const vobjid = allocateID();
  const purse = buildPurse(); // 'purse' is the initial Representative
  representatives.set(purse, vobjid);
  initializePurse(purse, ...args);
  return purse;
}

function hydratePurse(vobjid) {
  const purse = buildPurse();
  representatives.set(purse, vobjid);
  return purse;
}

The buildPurse code wants to reference its own data in several places. In the example above, it uses the fact that const bindings are visible within the scope of the definition, with a pattern like:

const foo = {
 deposit(srcPayment) {
  // stuff that references foo
 },
};
return foo;

Another pattern would be to define the methods with an extra initial "me" argument (us Python programmers spell this self), replaces the slightly awkward doubly-referenced placeholder name with a different invocation convention (callers would not provide me, of course):

return {
  deposit(me, srcPayment) { ... },
};

I can't help but wonder if we should resurrect this for this purpose, or if there's a way to benefit from prototypical inheritance (the "behavior" object would be the prototype of each Representative, and its methods would be invoked with this set to the representative).

Another option is to make the initialize be a specially-marked method of the "behavior" object, maybe with some Symbol so it couldn't be externally-invoked.

michaelfig commented 3 years ago

Please change the variadic initialize(representative, ...makerArgs) signature into initialize(representative, makerArgs), where makerArgs is the arguments of the maker function.

This gives us extensibility if we ever want to pass different arguments to the initialize call.

So, we'd destructure the arguments explicitly in the initialize function, like:

function initializePurse(purse, [initialBalance = 0]) {
    purseBalanceLedger.init(purse, initialBalance);
    const depositFacet = makeDeposit(purse);
    purseDepositLedger.init(depositFacet);
}

and its caller would be variadic, but pass the arguments without expansion:

function makePurse(...makerArgs) {
  const vobjid = allocateID();
  const purse = buildPurse(); // 'purse' is the initial Representative
  representatives.set(purse, vobjid);
  initializePurse(purse, makerArgs);
  return purse;
}

warner commented 3 years ago

Ok here's an updated version, which also changes buildPurse to representPurse, because that function is creating a Representative of the (virtual) Purse, which happens both the first time around and every subsequent time. Whereas initializePurse only ever happens once.

/* global makeExternalStore, sameRepresentative, makeWeakStore */
// these three come from (and share private tables with) liveslots

import { Remotable } from '@agoric/marshal';
import { makeInterface, ERTPKind } from './interfaces';

export function makeIssuerKit(allegedName) {
  const brand = Remotable(makeInterface(allegedName, ERTPKind.BRAND));

  const purseBalanceLedger = makeWeakStore(); // value: balance
  const purseDepositLedger = makeWeakStore(); // value: deposit
  const paymentLedger = makeWeakStore(); // value: balance
  const depositLedger = makeWeakStore(); // value: Purse

  const purseInterface = makeInterface(allegedName, ERTPKind.PURSE);
  const paymentInterface = makeInterface(allegedName, ERTPKind.PAYMENT);
  const depositInterface = makeInterface(allegedName, ERTPKind.DEPOSIT_FACET);

  function representPurse() {
    const purse = harden({
      deposit(srcPayment) {
        const purseBalance = purse.getCurrentAmount();
        const srcPaymentBalance = paymentLedger.get(srcPayment);
        const newPurseBalance = purseBalance + srcPaymentBalance;
        paymentLedger.delete(srcPayment);
        purseBalanceLedger.set(purse, newPurseBalance);
        return srcPaymentBalance;
      },
      withdraw(amount) {
        const purseBalance = purse.getCurrentAmount();
        const newPurseBalance = purseBalance - amount;
        const payment = makePayment();
        purseBalanceLedger.set(purse, newPurseBalance);
        paymentLedger.init(payment, amount);
        return payment;
      },
      getCurrentAmount() {
        return purseBalanceLedger.get(purse);
      },
      getAllegedBrand() {
        return brand;
      },
      getDepositFacet() {
        return purseDepositLedger.get(purse);
      },
    });
    // note: we cannot use 'purse' in a weak store yet, because it is not yet
    // associated with a virtual object
    return purse;
  }

  function initializePurse(purse, [initialBalance = 0]) {
    purseBalanceLedger.init(purse, initialBalance);
    const depositFacet = makeDeposit(purse);
    purseDepositLedger.init(depositFacet);
  }
  const makePurse = makeExternalStore('purse', purseInterface, representPurse, initializePurse);

  function representPayment() { // simpler because it doesn't reference 'self'
    return harden({
      getAllegedBrand: () => brand,
    });
  }

  function initializePayment(payment, [amount]) {
    paymentLedger.init(payment, amount);
  }
  const makePayment = makeExternalStore('payment', paymentInterface, representPayment, initializePayment);

  function representDeposit() {
    const deposit = harden({
      receive(payment) {
        const purse = depositLedger.get(deposit);
        return purse.deposit(payment);
      },
    });
    return deposit;
  }

  function initializeDeposit(deposit, [purse]) {
    depositLedger.init(deposit, purse);
  }
  const makeDeposit = makeExternalStore('deposit', depositInterface, representDeposit, initializeDeposit);

  return harden({ makePurse });
}
harden(makeIssuerKit);

Is this good enough for contract developers to use? And/or is it likely we could automatically translate into this format from something that is good enough for developers? And/or should we just get started with this and experiment our way into a more ergonomic API as we go?

michaelfig commented 3 years ago

Is this good enough for contract developers to use? And/or is it likely we could automatically translate into this format from something that is good enough for developers? And/or should we just get started with this and experiment our way into a more ergonomic API as we go?

The following is just my opinion:

Good enough for us to use to get something working with @FUDCo's backend, but not other contract developers. It is likely that we can translate our way from something good enough for developers. And we should just get started with it, use it for our own issuer.js, replace its uses with the translation mechanism when we have it (but leave your API intact for tests, etc), and then explain how to use the ergonomic API to a wider audience.

Other people's opinions are solicited.

michaelfig commented 3 years ago

Naming bikeshed: I suggest makeExternalStore could be renamed to make<ADJECTIVE>Constructor, where I'm not sure of the best ADJECTIVE, but I currently like ADJECTIVE = '':

makeConstructor - short and sweet, implies that consumers are using an API as opposed to its implementation. Then, the @agoric/store implementation can do its best with the environment it finds itself in.
makeExternalConstructor - focuses on the "external to us" aspect
makePagedConstructor - focuses on the "may be paged out" aspect
???

FUDCo commented 3 years ago

My take on externalizable storage for vats:

I've now implemented the API I had in mind. The implementation is not yet integrated with the kernel but it does do crude serialization to a store, manages a limited size cache of in-memory object state with handling of faulting state in when needed, and has support for serializing and deserializing representatives by their internal IDs (though this isn't yet actually integrated with marshal).

There's basically one function:

maker = makeKind(representInstance) produces a maker for the kind of object described by representInstance. The maker is invoked with whatever the initialization parameters are for the kind of object being defined.

representInstance(state) is a function of one parameter that returns an object containing the methods of the kind of object it represents. The one parameter is an object representing the object's state. All access to the persistent state is via properties of this parameter (I think this parameter would correspond to the $data parameter in Michael's transform).

If there is a method named initialize it will be treated as the initializer.

So you'd write something like this:

function representThing(state) {
  return {
    initialize(label = 'thing', counter = 0) {
      state.counter = counter;
      state.label = label;
      state.resetCounter = 0;
    },
    inc() {
      state.counter += 1;
    },
    reset(newStart) {
      state.counter = newStart;
      state.resetCounter += 1;
    },
    relabel(newLabel) {
      state.label = newLabel;
    },
    get() {
      return state.counter;
    },
    describe() {
      return `${state.label} counter has been reset ${state.resetCounter} times and is now ${state.counter}`;
    },
  };
}

const thingMaker = makeKind(representThing);

const thing1 = thingMaker('thing-1');
const thing2 = thingMaker('thing-2', 100);

then treat thing1, thing2, etc. like regular objects, but they are actually representatives for persistent objects that are keyed to the internally generated identifiers underneath in the manner that we've discussed.

Also, before you ask, via some sleight of hand the initialize method doesn't actually appear in the interface exposed on the resulting object, since it would enable people to Mess With Things.

The existing weak store would get extended to be indexable by these representative objects using the sameKey equivalence operator that, under the hood, compares representives by comparing their opaque identifiers (which are also used for marshaling serialized references for comms and storage).

Via another bit of sleight of hand, the value you return from the representInstance function can be treated as a representative of the resulting object, so you can store yourself or lookup yourself up in a weak store inside the methods (including the initialize method) if you want.

I rewrote Brian's rewriting of purse based on this API. It's a little simpler because he was keeping all the state in weak stores, but we only need the weak stores for rights amplification, so this reduces the number of weak stores from 4 to 1 and lets objects that just want to keep their own state do so naturally.

It looks like this:

/* global makeExternalStore, sameRepresentative, makeWeakStore */
// these three come from (and share private tables with) liveslots

import { Remotable } from '@agoric/marshal';
import { makeInterface, ERTPKind } from './interfaces';

export function makeIssuerKit(allegedName) {
  const brand = Remotable(makeInterface(allegedName, ERTPKind.BRAND));

  const paymentLedger = makeWeakStore(); // value: balance

  function representPurse(state) {
    const purse = {
      initialize(initialBalance = 0) {
        state.balance = initialBalance;
        state.deposit = makeDeposit(purse);
      },
      deposit(srcPayment) {
        const srcPaymentBalance = paymentLedger.get(srcPayment);
        paymentLedger.delete(srcPayment);
        state.balance += srcPaymentBalance;
        return srcPaymentBalance;
      },
      withdraw(amount) {
        state.balance -= amount;
        const payment = makePayment();
        paymentLedger.init(payment, amount);
        return payment;
      },
      getCurrentAmount() {
        return state.balance;
      },
      getAllegedBrand() {
        return brand;
      },
      getDepositFacet() {
        return state.deposit;
      },
    };
    return purse;
  }

  const makePurse = makeKind(representPurse);

  function representPayment() {
    const payment = {
      initialize(amount) {
        paymentLedger.init(payment, amount);
      },
      getAllegedBrand: () => brand,
    };
    return payment;
  }

  const makePayment = makeKind(representPayment);

  function representDeposit(state) {
    return {
      initialize(purse) {
        state.purse = purse;
      },
      receive(payment) {
        return state.purse.deposit(payment);
      },
    };
  }

  const makeDeposit = makeKind(representDeposit);

  return harden({ makePurse });
}
harden(makeIssuerKit);

michaelfig commented 3 years ago

I've now implemented the API I had in mind.

Looks good. I can easily imagine the rewrite allowing closed-over local variables in representThing, and targeting this explicit API. And it may turn out not to be necessary.

Interested in seeing @erights and @warner 's responses.

erights commented 3 years ago

The initialize thing seems dirty. How about something like

const makeThingState = (label = 'thing', counter = 0) => {
  counter,
  label,
  resetCounter: 0,
};

const representThing = state => harden({
    inc() {
      state.counter += 1;
    },
    reset(newStart) {
      state.counter = newStart;
      state.resetCounter += 1;
    },
    relabel(newLabel) {
      state.label = newLabel;
    },
    get() {
      return state.counter;
    },
    describe() {
      return `${state.label} counter has been reset ${state.resetCounter} times and is now ${state.counter}`;
    },
  });

const thingMaker = makeKind(makeThingState, representThing);

This avoids doing surgery based on treating a property name like initialize as magic. initialize wasn't really a method. It became a distinct function by magic surgery. Instead, just code it as a distinct function. This version is less imperative. It is closer in spirit to @michaelfig 's original, but I have not thought about it as a target of translation. It may need further adjustment to accommodate that well.

erights commented 3 years ago

Your code did lead me to a new insight about ERTP: Purses need to rights-amplify payments. But nothing needs to rights-amplify purses. Hence the existing ERTP should just use a lexical balance for purses and we should get rid of the purseLedger. The purseLedger we've got is vestigial from when deposit was purse-to-purse.

Attn @katelynsills @Chris-Hibbert

katelynsills commented 3 years ago

Your code did lead me to a new insight about ERTP: Purses need to rights-amplify payments. But nothing needs to rights-amplify purses. Hence the existing ERTP should just use a lexical balance for purses and we should get rid of the purseLedger. The purseLedger we've got is vestigial from when deposit was purse-to-purse.

Attn @katelynsills @Chris-Hibbert

Hmm, can you explain further? I don't think the purseLedger is vestigal - the latest ERTP was designed from scratch. Of course, it could no longer be needed, but I'm having trouble seeing how

erights commented 3 years ago

Hi @katelynsills see https://github.com/Agoric/agoric-sdk/pull/1889

FUDCo commented 3 years ago

The initialize thing seems dirty. How about something like...

Actually, what you propose was the way I originally did it, based on a similar misgiving about magic (which concern does linger in my sensibilities still). I tried out the initialize method as an experiment and found I really liked the ergonomics; it was more convenient with less ceremony and less cognitive burden. So I figured I'd leave it in there to see how folks reacted to it. I particularly like that having the initializer and the other methods in a common scope means you can readily put any supporting code they share in there. Plus it ensures that some tightly coupled things are actually together in the code. Having special magic does remain a real concern and so it's not something I'm prepared to fight terribly hard for, but I do like presenting the user with fewer parts they have to assemble themselves.

One way to frame it is that with the initialize pattern you have to explain that the user needs to write this special "method" that will only be called once by the framework and then become inaccessible, whereas with the two-parts pattern you have to explain to the user that they need to create these two coordinated pieces and that the thing returned by the state initializer is the same thing that a moment later will be passed to the representative maker. Either way you have an annoying explanatory burden. What would be ideal is something that unifies everything into a single construct that the user writes but without initialize's attendant magic.

erights commented 3 years ago

I agree with all of that analysis. Between

special "method" that will only be called once by the framework and then become inaccessible

and

two coordinated pieces and that the thing returned by the state initializer is the same thing that a moment later will be passed to the representative maker

I strongly prefer the latter.

FUDCo commented 3 years ago

I think you might be able to achieve some kind of unification effect by passing the state initializer and the representative maker inline to makeKind, like:

const thingMaker = makeKind(
  (label = 'thing', counter = 0) => {
    counter,
    label,
    resetCounter: 0,
  },
  state => {
    inc() {
      state.counter += 1;
    },
    reset(newStart) {
      state.counter = newStart;
      state.resetCounter += 1;
    },
    relabel(newLabel) {
      state.label = newLabel;
    },
    get() {
      return state.counter;
    },
    describe() {
      return `${state.label} counter has been reset ${state.resetCounter} times and is now ${state.counter}`;
    },
  },
);

It's mainly just a code formatting trick and the added burden of getting all the nested punctuation right might not pay its way. And of course it's also nice for things to have names. But it's something to consider.

(BTW, note that you don't need the harden as makeKind will do it -- one less thing for the user to get wrong)

FUDCo commented 3 years ago

An unrelated issue that came up as I was working on implementing the above stuff was how sameKey will work in order to realize a weak store. The problem is that sameKey requires some special privileged access to underlying details of the object being used as a key. This is reasonably easy to arrange when the externalizable object implementation and the sameKey implementation are in active cahoots, but raises the question of how to do this in support of a generalized weak store implementation that can use arbitrary objects as its keys. The sameKey implementation I wrote falls back on object identity comparison when either or both of the objects being compared aren't externalizable object representatives, but if there were some other kind of object that had its own version of sameKey comparison I don't know how that would be incorporated. I certainly don't like having one special kind of object that gets privileged treatment. On the other hand, I don't see a way for an object to implement a generalized "are you the same key as this other object?" method without having comparison be coordinated by some kind of weird grant-matcher-like rendezvous machinery, which seems excessive.

FUDCo commented 3 years ago

Further thought: this is yet another thing that would be aided by having some kind of reliable brand-check mechanism.

warner commented 3 years ago

@FUDCo I like it.

Question 1

in the Purse withdraw method:

      withdraw(amount) {
        state.balance -= amount;
        const payment = makePayment();
        paymentLedger.init(payment, amount);
        return payment;
      },

it calls makePayment() with no argument, and then manually initializes the paymentLedger entry (which I think would fail because we're calling init twice, and Stores don't accept that). Would it be correct to instead do?:

      withdraw(amount) {
        state.balance -= amount;
        const payment = makePayment(amount);
        return payment;
      },

It raises a minor ergonomics issue, summarized as "There's More Than One Way To Do It (pejorative form)", but I don't think it'd be a big deal in practice: if you have the right weakstore amplifier, you can mess around with somebody else's state directly, but it's still nicer to ask them politely when they have an API for the particular thing you're doing.

Point 2

makeWeakStore keeps its data in secondary storage, of course

Point 3

makeKind should maybe take a second argument to set up the Remotable and interface name properties of all representatives, so that initialize doesn't have to.

Question 4

@erights could you try writing the Purse example using your alternative two-function API? The problem I think you'll run into is that the Payment initializer needs to keep its amount in the rights-amplified weakstore (so a Purse method can get to it), which means it needs a key to use in that store, which is of course the Payment object/representative itself. But if the initializer is creating that object itself, then the object is not yet known to the makeKind/makeWeakStore infrastructure, so there's no object ID associated with it, so it cannot yet be used as the key for a weakstore. The initializer's return value will be exposed as the first representative to the caller of makePayment(), but it doesn't qualify as a representative until something inside makePayment has added it to a WeakMap which maps representative objects to their (virtual) object ID, and that can't happen until after the initializer returns.

In @FUDCo 's one-function API, the object is created first, then its initialize() method is called, which means the internals of makePayment have a chance to add it to that WeakMap before initialize() sees it, so initialize() is free to use the captured payment object as a representative. In my version, initializePurse() was a separate function which accepts a payment object, rather than creating one, so it can be given a fully-representative object that's already in the rights-amplifying WeakMap.

In your proposed two-function version, makeThingState returns an object. This is certainly more ergonomic, but it means we can't do something like paymentLedger.init(payment, amount) from within the initializer, because the payment object is not yet associated with an object ID, and does not yet qualify as a representative.

Concern 5

@FUDCo and I were talking about the implementation of the state object (which is probably a Proxy, so it can accept arbitrary properties, rather than something with setters/getters for a specific list of pre-known property names, because we don't have an easy way to glean a schema from initialize). And I known @michaelfig mentioned maybe needing a membrane here, to sense deep changes in the state object. My concern is what happens if users do const piece = state.piece; and then capture that in a method they call later (maybe after an await, maybe in a function they return to the caller, I dunno), and that method does piece.foo = blah; or even just reads piece.foo. If the Proxy is shallow, this wouldn't trigger a Proxy trap, and the user's write would be ignored (i.e. it would go unnoticed), or their read would get stale data.

There will be multiple representatives for any given virtual object: we can't prevent that because we don't have WeakRefs at this level. We also don't get to know which representatives are still around, for the same reason. Those representatives must all interact with the same state. One way to achieve that is for them all to share the same state object, but I don't think we can do that and still only hold a limited number of them. So I think the implementation needs state to be something magic, which will access the real state on demand, which means it must be able to sense when it's being used, which gets us back to the deep-Proxy question.

In my all-state-is-rights-amplification approach, the user-provided code is doing obvious (and painful, I'll admit) ledger.get() and ledger.set() calls each and every time it wants to touch that state. The benefit of this explicitness is that it's glaringly obvious that you can't expect to hold on to a child object and read non-stale data from it later, or modify it and expect your changes to take effect. Code must do a get() at the last possible moment, and it must do a set() at the first possible moment.

The Proxy approach doesn't reveal anything to the user that would suggest state is not a normal object, shared across all the representatives, so it conceals the hazards, and there's no no built-in educational mechanism to guide authors away from them. If their code shares pieces of state outside the representative's methods, they'll get intermittent stale data or forgotten writes, which will be really hard to debug (and I expect they'll only happen when the system is busy enough to create lots of representatives for the same virtual object, which means it will happen infrequently enough that they'll never track it down). "Mysterious behavior" plus "happens infrequently" plus "the code looks correct" is a perfect storm for hard-to-diagnose bugs.

So that's my concern with the magic state approach. If there's a way to make all representatives share the same state object, and if that object were plain data (instead of an access-sensing Proxy), then I think it wouldn't be a concern, but I don't know how to achieve that without WeakRefs or keeping all state objects forever (which eliminates the secondary-storage utility that we need).

warner commented 3 years ago

(my attempt to capture pieces of our 21-Oct-2020 kernel meeting, in no particular order)

@dtribble wants to avoid "chained transitive reads", where loading one virtual object whose state points to a second one doesn't need to load the second data right away. To imagine a worst-case, think about a linked list made of virtual objects: loading one element should not cause the entire rest of the list to be loaded into RAM.

If userspace code compares a stashed Representatives for object equality against a new representative, and the answer reveals information about how many representatives have been created during the interval, that opens up a communication channel between otherwise independent object graphs. We would consider this "EQ representative" channel to be a fatal flaw.

An equally fatal "EQ oldvalue" communication channel could result if userspace code uses a Representative to read data from secondary storage and stashes the object it gets back, and compares that for object equality against a new object read the same way but later.

A system might have exactly one Representative per virtual object, or many. To achieve exactly one representative would require using WeakRefs, and would almost certainly cause the DB read/write calls to not be a deterministic function of vat operations.

The "EQ Representatives" channel would be closed by having exactly one Representative per virtual object, or by having each message delivery create a new Representative.

The state API made available to the userspace-provided behavior function (e.g. representPurse) might provide one of the following depths of access:

1: shallow: oldvalue = table.get(representative) or table.set(representative, newvalue), like in my all-rights-amplification example
2: one-level deep: oldvalue = state.propname; state.propname = newvalue;, as in @FUDCo 's Proxy-based example . Modifying a deep property (state.propname.foo = newvalue) would be disallowed somehow.
3: arbitrarily-deep, state.propname.foo = newvalue; This would require a membrane of Proxies to sense, and we think wouldn't be coherent overall (too many opportunities for stale reads or forgotten writes).

The shallow API is the least magical. The deeper APIs are more ergonomic. I think we agreed that one-level deep is a reasonable compromise.

The data we put into secondary storage will be marshalled using the same functions we use for E() message arguments. This means the data must be hardened, either before the set() / state.foo=, or inside it. Ergonomics and user feedback on our message sends suggest making the harden internal, @erights winces at this side-effect.

After some thought, I have a set of three proposals. The simplest is:

Brian's Proposal A: no cache

Each unserialize produces a new Representative, with a new state object. The state object is a Proxy which has access to the DB syscalls and the object ID. There is no LRU cache. Each oldvalue = state.foo causes a DB read. Each state.foo = newvalue causes a DB write. We use a WeakMap from Representative to object ID for serialization and the sameKey predicate.

The new Representative for each message delivery closes the "EQ representative" channel. Each DB read is followed by a unserialize of the data, producing a new oldvalue object graph, closing the "EQ oldvalue" communication channel.

The oldvalue object graph is hardened, so state.foo.bar = newvalue (which is read-modify-write) is an error (state.foo is hardened). Each DB write either hardens newvalue or requires it to already be hardened, so foo = {}; state.foo = foo; foo.bar = newvalue is an error. This error happens immediately upon the assignment statement, so it can educate users (rather than happening inscrutably a long time later when the data is lazily written).

Downsides: if the behavior code reads the data a lot, we do lots of DB reads. Same for writes.

Proposal B: read cache, write-through cache

We still use new Representatives and new state object for each delivery, but we introduce a third "real state" object (I think @FUDCo had a better name for this?) which is shared among (but not directly referenced by) all representatives for a given object ID. The LRU cache is indexed by object ID and holds the "real state" object, which retains the serialized capdata of the object (as returned from a DB read syscall, not its deserialized object graph). The state object has access to this LRU cache and the object ID. When state wants to read, it calls LRU.provide(objectID) which either returns an existing object, or pulls a new one in (performing a DB read). Pulling in a new object will knock something else off the end of the cache, which doesn't trigger any special behavior.

Each oldvalue = state.foo causes a new deserialization (closing the "EQ oldvalue" channel), but not necessarily a DB read. Each state.foo = newvalue causes an immediate DB write, hence not needing to do anything special for the knocked-off-the-end object.

Upside: each read costs a deserialization, but not a new DB read. Lots of writes cause lots of DB writes.

Proposal C: write-back cache

state.foo = newvalue cause an immediate serialization of the object graph, which is stashed in the "real state" object along with setting a dirty flag. If/when a dirty one is pushed off the end of the LRU cache, the previously serialized data is written to the DB. We serialize immediately to report errors to the user right away, and to avoid ever returning the user's objects back to subsequent readers (to close the "EQ oldvalue" channel).

FUDCo commented 3 years ago

Closed by #1967