Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
327 stars 206 forks source link

maybe have kernel retain vatParameters, to enable kernel-unilateral vat upgrade #8947

Open warner opened 7 months ago

warner commented 7 months ago

What is the Problem Being Solved?

One task identified as a requirement for a kernel-unilateral restart-all-vats feature is to have the kernel retain each vat's vatParameters.

vatParameters is a chunk of data (passable object graph, except Promises are forbidden) which is provided as the second argument to the vat's buildRootObject() function at vat startup time. It is meant to parameterize the vat bundle for some specific purpose (making it distinct from all other vats using the same bundle). By providing these parameters early, before any messages have arrived, the vat bundle can use them when (re-)defining its Kinds.

Currently, the kernel treats vatParameters more like a message. In particular, the kernel does not remember the data after vat startup. vatParameters are held in the run-queue startVat event record, which is dropped after the delivery is begun. Any any object krefs in the vP are held with a refcount until they leave the queue, after which (if nothing else is holding onto them) they may get garbage collected.

(The same is true of vat root objects: once the bootstrap message is delivered, the kernel does not retain a reference to them, and the objects may get collected if the bootstrap vat did not choose to hold onto them. The policy I chose was to have the kernel defer to userspace decisions about what is important to retain and what is important to collect).

However, we can't correctly restart a vat without its vatParameters, and if the kernel is going to unilaterally restart vats, we need to keep them around.

Description of the Design

We'll put a copy of the vatParameterCapData (or "vpcd" for short) into the kvStore vNN.options record, or perhaps in a new vNN.vatParameterCapData record.

When userspace creates a vat (via vat-admin), the kernel will store a copy of the vpcd, and increment the refcounts on any objects. When userspace upgrades a vat (supplying new vatParameters), the kernel will retrieve the old data, store the new data, increment the refcounts on the new data, then decrement the refcounts on the old data. When the vat is terminated, the refcounts will be decremented.

We'll continue to put a copy of the vatParameterCapData in the startVat message, and increment the refcount when doing so, and decrement the refcount when delivering startVat, to avoid upgrade concerns about refcount skew. However the startVat-time decref will never allow an object to be freed, because all such objects will also have a refcount for the kvStore copy.

Security Considerations

None I can think of. Userspace cannot use this feature to retain objects any longer than it could have without it.

Scaling Considerations

This causes all vatParameter objects to be held uncollectable until userspace upgrades the vat with different objects, or the vat is terminated. This is unlikely to represent a significant memory "leak", both because of the low rate of upgrade, and because (at least traditionally) most objects passed to vatParameters are used and held by the userspace code for the entire incarnation.

Test Plan

Unit tests on the kernel side, in the code that checks refcounts.

Upgrade Considerations

Current vats do not have any retained vpcd, and cannot retrieve it from the kvStore contents. So we need code to recognize that a vat does not currently have this data, and make it ineligible for unilateral upgrade.

warner commented 4 months ago

Another decision to be made: when E(adminNode).upgrade() is called without a vatParameters option, should that mean "use the old vatParameters", or should (as it does now) mean "vatParameters = {}" ?

The #8405 "upgrade all vats" command will always use each vat's old vatParameters, as there's no place to put replacements (even if we were willing to take an option with one property for each of our 100+ vats, the upgrade-all-vats call will happen from outside the kernel, so it cannot accept slots / object references).

I'm kind of inclined to have it mean "use the old vatParameters". I think we'd need some vat-vat-admin changes to accomplish that, to tell the kernel whether the option was undefined or {}.