Open warner opened 10 months ago
If we're gonna do this, could we not also do a full trace to detect orphans and unreferenced cycles?
My immediate need is to tell whether #8756 needs remediation, or if we can simply deploy a fix and not worry about cleaning up old data, and that just need to look for refcount discrepancies.
This tool would certainly be the basis for a full mark-and-sweep trace, but I'd consider it extra credit, and I'd like whoever implements this to answer the remediation question before doing the extra work.
Talking with @siarhei-agoric , we decided that many modes of this tool do not really need a separate intermediate DB. For example, auditing for refcount consistency within a single vat can do:
Map
Map
and check the refcount for eachmap.has()
for each, to look for stranded refcount rowsThat requires RAM in proportion to the number of vrefs in use by the vat, perhaps 1-10M for the larger ones, but does not require storing full details about each edge.
If the tool detects a mismatch, we could re-run it in a verbose mode, with a specific set of vrefs to look for (repeating the full walk to gather more data), but with luck it will report "all refcounts match" and we don't need to spend the extra effort.
As mentioned in #8756 :
The task for this ticket is to build that tool, and run it against the current mainnet database, to see if we have any problems. I suspect everything is fine, but it might show evidence of #8756, or of some other as-yet-discovered GC problem.
The tool should check both kernel and vat refcounts for consistency. The starting point should be a copy of the kvStore (just a new DB with
CREATE TABLE kv (key STRING, value STRING)
and a copy of the wholekvStore
), which will be a lot smaller and easier to work with than the full swingstore.Then I'd parse that data into a better-organized third DB. It should be parsed into a kernel-object table and a c-list table. Then the keys for each vatstore should be parsed and copied into tables for:
vom.${vref}
)vc.5.|schemata
)encodedKey
->marshalledValue
"forward" entries) (vc.5.${encodedKey}
)vc.5.|${vref}
)rc
entries)es
entries)ir
entries)Kernel Refcounts
The kvStore should be parsed into a kernel-object table, with columns for:
koid
(the object id kref)owner
: a vatID, or NULL for abandoned objectsreachable
andrecognizable
counts from the${kref}.refCount
entriesand also a c-list table:
o+
, 0 foro-
In general, the "reachable" refcount should equal the sum of the
exported=0 AND reachable=1
entries in the c-list table. The "recognizable" refcount is computed similarly. We ignore theexported=1
entries, because theirreachable
flag indicates whether the kernel can reach the exporting vat's object, so it doesn't count towards the refcount.Some objects might have a pinned refcount (via
controller.pinVatRoot()
), we might have elected to do this to the bootstrap vat's root object, but maybe not. Any other discrepancies might indicate a kernel GC problem.Vat Refcounts
In addition to parsing the kvStore into vatstore entries for each vat, and then parsing the vatstore entries into virtual-thing -specific tables, the tool should also scan the
vom
virtual-object state records for vrefs, and populate a table that says e.g. "virtual object o+d10/12 has a strong reference to vref o-3". This table would also have rows for collections which point to other vrefs, maybe also through a third table that captures both the collection's vref and the vref of the key (the implication being that both source vrefs must be alive to keep the target vref alive). And some variant of this for weak collections would be good.Then, the tool should add up the inbound strong references from both virtual-object state and collections, and compare that count against the
rc
entry. If they don't match, we've got a bug (perhaps #8756, perhaps a new one).It should also compare the
es
export status with the matching c-list entry for that vat. Ifes
claims the exported vref is reachable, the c-list should sayR
. Ifes
says merely-recognizable, it should say_
.Finally, it should check the recognizability records for consistency. Vrefs that are used as keys in weak collections will appear both in the collection entry rows (as an ordinal), and there will be an
ir
record mapping from the vref to the collection ID. If we see one without the other, we've got some other kind of bug.