efficient DB enumeration of virtual collections for stopVat-time deletion

warner commented 2 years ago

What is the Problem Being Solved?

As part of #1848 (in particular #5056) I need stopVat() to enumerate and delete all virtual collections. I couldn't find a good way to locate them all from just the DB, so I introduced an in-RAM Set with one vref per collection. stopVat() walks this set, checks each to see whether it's virtual or durable, and calls the existing deleteCollection() for the virtual ones. This has the nice property of decrementing refcounts and thus dropping imports and durables that were only referenced by the late collections. Also, the use of a Set makes it easy for deleteCollection() to ignore duplicate deletes, such as what happens when the following scanForDeadObjects sees zero refcounts for the collections and attempts to delete them a second time.

That set was a decent short-term fix, but a zillion virtual/durable collections causes O(N) RAM usage, which violates our RAM budget goals.

To fix this, we need a way to enumerate all the virtual collections from an efficient DB query. Each virtual collection has a collectionID (an integer, shared among collections of all kinds, some of which are virtual, others are durable). Each collection has a batch of keys like vc.${collectionID}.|${metadataKey}, which holds the collection size, a label, a nextOrdinal counter for the numbered-because-they're-references keys, and the vref-to-ordinal mapping for those keys. Those are joined by a O(collectionsize) batch of keys like vc.${collectionID}.${encodedKey} for the actual entries. In addition, each virtual object (of which a tiny subset are collections) has a refcount key in vom.${baseref}.

One way to enumerate all collections would be a vatstoreGetAfter() starting at vc., extract the collectionID from the first key it finds, then skip ahead to vc.${collectionID+1}. That's O(N) in the number of live collections, but feels like a horrible hack. Another would be to count up from collectionID = 1 to the current next-allocation value and probe each for a well-known metadata key like vc.${collectionID}.|label, however that's O(N) in the number of historically allocated collections (i.e. if we've created and deleted a lot of collections, we have to walk through all of the deleted ones too).

Neither approach helps us figure out the kind of the collection, which is what we need to determine whether it's virtual or durable. The collection DB keys don't record the kind or the durability status. Somewhere out there is a vref for the collection, in the form o+${kindIndex}/${collectionID}, and we can look up the Kind index in a table to find out about it's status (there are only a small number of Kinds, and we keep information about all of them in RAM). But we don't track a list of all instances of a given kind.

Given the current DB structure, the best approach I can think of is to start with a phase that filters out all the virtual Kinds from the table in RAM, then searches the vom.${baseref} section for refcount records that start with the o+${kindIndex}/ prefix for each one. That would give us a list of collections (of that Kind) which are referenced by any form of virtualized data. However the collection object might be held only by RAM, in which case there won't be a refcount record for it, so we'd have to somehow merge this information with the export-status records (es.${baseref}) as well as walking slotToVal to find the in-RAM pillars.

Description of the Design

So what I'm looking for is a DB structure that makes it cheap to find (and delete) all the virtual-object and virtual-collection records, without assistance from an in-RAM table whose cardinality equals the number of collections.

One simple starting point would be a section of valueless DB keys which only exist to help us find the collections, maybe a vcc.${collectionID} -> kindID for each one. We'd add the key upon creation of the collection, and delete it if/when the collection is deleted. To delete all the virtual collections, we'd walk that prefix (getting both virtual and durable collections), look up the kindID on each to see which type it is, and then delete just the virtuals.

We could avoid the cost of looking at durables by splitting the table into two pieces: vc-virtual.${collectionID} and vc-durable.${collectionID}. The kindID value is less interesting then, but probably useful for something else later.

But in the long run we want the kernel to be able to delete this data (efficiently) without help from the vat. To support that, I think we want to split the entire vatstore into virtual and durable subsets. Either we double the vatstore API to have a virtual get/set/has/delete/getAfter, or we add a durable flag to the existing calls. The kernel would use a different prefix to separate out the two categories, maybe ${vatID}.vs.d.${key} for the durables and ${vatID}.vs.v.${key} for the virtuals. The kernel could then delete all non-durable data with a single prefix. And if/when we finally manage to move to SQLite for the kernelDB, these could go into entirely separate tables, and a single super-efficient SQL operation would wipe out the whole thing. (Of course, if the kernel deletes the virtuals like this, it must do a mark-and-sweep on the durables to figure out which ones were fatally dereferenced by the virtuals, but we need that anyways to clear out everything properly).

Security Considerations

none I can think of

Test Plan

unit tests

warner commented 2 years ago

cc @FUDCo any ideas?

FUDCo commented 2 years ago

I don't think you need a data structure. The O(N) iteration you're afraid of is exactly what you need, because you have to iterate over all those keys to delete them anyway. You don't have to probe for the gaps in the collection ID space. If you start iterating from vc.1. but the first actual collection that exists is collectionID 427, then the immediate return from your first call to vatstoreGetAfter will have the key vc.427.${whatever}. You can then find the next collection by resuming iteration from vc.428..

With respect to finding out the kind and durability status of a collection, it would be very easy to add a vc.${collectionID}.|metadata key to each, and arguably we should have that anyway.

FUDCo commented 2 years ago

Actually, the iteration is slightly trickier because you will have to take into account that the DB iterates in lexicographic order rather than numeric order, but we don't actually care about the specific order the collections get cleared out in as long as we don't miss any.

warner commented 2 years ago

You can then find the next collection by resuming iteration from vc.428..

That's what I meant by the "horrible hack". The lexicographic ordering is weird but doable. The hackish part is that we aren't using vatstoreGetAfter to iterate through a dense cluster of keys (which is what it tries to optimize for), rather we have to interrupt the iteration after getting just a single result, and the start again from some different prefix so we skip over all the other collection entry keys in the middle. The loop that's finding collections is not the same loop that's deleting their keys, since we're delegate that inner loop to a clearInternal sort of function. Our loop wants to get exactly one key from each collection, doesn't matter which, just to learn the collectionID. The key it read might or might not get deleted during clearInternal (it might be a metadata key).

But yeah, I concur that we could figure out the list of collectionIDs by walking the existing keys. And also that we need a new key to get from the collectionID to the KindID/durability info. As we discussed offline, since we need a new key anyways, we could add it to a different section of the keyspace and enable this less hackish enumeration I'm thinking of.

Agoric / agoric-sdk