Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 208 forks source link

Vatstore syscall API needs some kind of "enumerate keys" operation #3734

Closed FUDCo closed 2 years ago

FUDCo commented 3 years ago

What is the Problem Being Solved?

The underlying stores used by the vatstore (LMDB or JS Map, depending) support a key iterator operation. This operation is used in various places by the kernel (for example, to iterate over a vat's c-list entries), but it was never made part of the vatstore API because (a) we didn't think we needed it and (b) why export potentially expensive, potentially abusable functionality if we don't need to?

However, to properly implement garbage collection of the VO weakStore collection type (actually, what MarkM has now renamed the virtualScalarWeakMap), we need to iterate through the keys of the store when a store is GC'd, in order to decref anything the store points to. However, since this store uses vatstore keys directly for entry lookup, such iteration requires iterating over the vatstore keys themselves, so it turns out that the operation we didn't think we needed we need after all.

Description of the Design

The underlying kvStore interface provides a getKeys function that returns an iterator over a range of key strings. However, passing an iterator through the syscall API seems problematic, so a naive direct wrapping of the getKeys call would be awkward.

Most of the time (not always, but nearly always), kvStore clients use these keys in a loop to iterate over the actual entries they index. Fortunately, this does include the anticipated vatstore use case, so I propose that rather than providing a call to iterate over keys each of which will immediately be used in another call to fetch the data, we instead provide a call:

vatstoreGetAfter(keyPrefix, priorKey) => [key, value]

keyPrefix is used to limit the range of keys looked up; priorKey must either be the empty string or have this prefix. priorKey indicates the key immediately before the key that will be retrieved.

This call fetches the value with the key most immediately following priorKey in the data store and returns a pair consisting of key, the key that was actually fetched, and value, the string value that key indexes. If there are no keys with prefix keyPrefix following priorKey, then it returns undefined.

This would be used in code something like:

const prefix = `vom.ws${tableID}.`;
let key = '';
let value;
while (const fetched = syscall.vatstoreGetAfter(prefix, key)) {
  [key, value] = fetched;
  doSomethingWith(key, value);
}

If tableID was 47, this would call doSomethingWith with each key that starts with vom.ws.47. along with its associated value.

Implementation note: Internally, the kernel-side implementation can retain the key string that it returns so that in the very common case of the very same string being used as the immediately next priorKey parameter it can avoid regenerating the database cursor or other underlying iterator when it doesn't need to.

Security Considerations

As with everything to do with the vatstore API, care must be taken to ensure that this call does not allow the caller any access to database keys outside the range allotted to the calling vat's vatstore.

Test Plan

FUDCo commented 3 years ago

@warner

warner commented 3 years ago

I was thinking about this some more after chatting with @FUDCo . I think our sketch now looks like:

This costs us constant RAM for each iterator, and constant RAM for each collection, but only disk for the entries in the collections. The collections aren't serializable, because of the in-RAM generation counter (and possibly other reasons unrelated to iterators). To remove this constraint by keeping the generation counter on disk, we'd have to do a lot more DB writes. (My hunch is that we don't need entire collections to be serializable).