framework to unit-test vat upgrade and durable data

What is the Problem Being Solved?

Currently it is a hassle to write tests of vat/library upgradability. The one scheme we have is more of an integration test, where you use a real swingset kernel and build a pair of vats (one pre- and one post- upgrade), import the library you're testing into those, have a controller/bootstrap vat drive an upgrade and a bunch of queries, log the output, and then the only real t.deepEquals() -style AVA assertion is to compare the output log against a "golden master".

These tend to be slow, coarse, hard to run under a debugger, and do not provide useful coverage data.

It would be great if we could write more "unit-ish" unit tests: smaller, faster, exercising finer details.

Description of the Design

This kind of code is importing from @agoric/vat-data to get the primary virtual/durable tools:

defineKind (times Durable and Multi variations)
makeKindHandle
providePromiseWatcher/watchPromise
makeScalarBigMapStore (times Set and Weak variations)
canBeDurable

Those all come from globalThis.VatData, which is provided to real vats by the swingset liveslots layer (since they must all integrate tightly with the Presences and Representatives that liveslots manages).

(@agoric/vat-data also exports secondary provide/vivify tools, like provideDurableMapStore, defineVirtualFarClass, vivifyKind, and vivifyFarClass, but these are all wrappers around the core functions that come from VatData)

We have a "fake virtual stuff" utility already, to unit-test libraries that import from @agoric/vat-data (or code that uses such libraries). Your unit test starts with import '@agoric/swingset-vat/tools/prepare-test-env.js' (or prepare-test-env-ava.js, since we always use AVA), and that use a function named makeFakeVirtualStuff to populate globalThis.VatData with enough code to support defineKind/etc.

The backing store for makeFakeVirtualStuff() is a plain Map, which takes the place of the syscall.vatstoreSet(key,value) that liveslots would use to keep data in the DB. makeFakeVirtualStuff() then creates a VirtualObjectManager and CollectionManager around this Map, asks them for defineKind and friends, and populates globalThis.VatData with those utilities. As long as this happens before any user-level code is imported (and thus imports @agoric/vat-data), the utilities will be in place on the global in time to be imported by user code.

My plan is:

change fakeVirtualSupport.js:
- add an option to supply the Map instead of always creating a local/private one
change prepare-test-env-ava.js:
- create a single Map at import time, and pass it to makeFakeVirtualStuff, so the Map can be reused later
- add a simulateUpgrade function, which calls makeFakeVirtualStuff() again but with the same Map as before, to represent the DB that is retained across an upgrade
- this will return a baggage object (a BigMapStore), which uses a known/stable ID within the DB/Map
- each time simulateUpgrade is called, it creates a new baggage, backed by the new VOM/etc, but because it uses the same ID, it will return equivalent contents
- build wrappers around everything we add to VatData so they can be replaced by simulateUpgrade()
- e.g. VatData.defineKind = (...args) => currentVOM.defineKind(...args), etc
- because @agoric/vat-data re-exports the pieces of globalThis.VatData, and downstream code imports those pieces from vat-data, so their identities are immutable

A unit test that wants to exercise an upgrade would import the code-under-test and prepare-test-env-ava.js as usual, then in the test case:

baggage1 = simulateUpgrade()
call the library's normal initialization function (passing in baggage1) which will do defineKind/provide/etc, and return some apiV1 object to interact with
interact with apiV1 to build up some durable state
baggage2 = simulateUpgrade()
call the library's init function again (this time with baggage2), which will re-do the define/provides/etc, and return a new apiV2 object
- this time, the library's provide() calls will observe the pre-existing state, and refrain from repeating defineKind and the other "once per vat, not once per incarnation" actions
interact with apiV2 to confirm that the durable state is still present

I think this approach will work, but it has some significant limitations, most of which derive from our (justified) choice to expose these stateful utilities (defineKind/makeScalarBigMapStore/etc) as imports, rather than passing them into library code as arguments:

we cannot tolerate parallel execution of multiple unit tests that want to simulateUpgrade, since they all reference the same global state
- test files which use this tool will probably need test.serial on every test
- forgetting test.serial will cause hard-to-debug random failures, or worse
the test function must be careful to never interact with apiV1 again after the simulateUpgrade(), because that version will be accessing stale data
- we might make this safer by building the prepare-test-env.js wrappers to stop working after simulateUpgrade(), but I think that would be tricky: instead of only wrapping defineKind/etc, we'd need to wrap the values they return, i.e. a full membrane instead of a shallow single-level wrapper

Security Considerations

None, this is only to support unit tests

Test Plan

Write an example test in packages/SwingSet/, advertise it to authors of other libraries that use virtual/durable data.

This sounds like the right framework. Questions I don't see addressed above (I might have missed something)

inside simulateUpgrade can we re-call vivifyFarClass and all its cousins in a way that draws on the old baggage, but could attach new behavior to existing state? [I think the description above says yes, but I got lost in

the library's provide() calls will observe the pre-existing state, and refrain from repeating defineKind and the other "once per vat, not once per incarnation" actions

I think the tests want to re-use kind identifiers, and associate new behavior with old baggage. I don't know if they do this by copying state into new baggage, or continuing to interact with the baggage that was previously there.

Can test code probe the contents of various generations of baggage? I don't yet know what data structures I'd be looking for, but I'm sure I'd learn once I had the context.

I'm okay with the choice to require test.serial every time. When we first started writing swingsetTests, the best way was one test per directory. Since upgrade tests are the granularity of integration tests, that seems fine.

I'm hopeful that read-only interaction with the previous generation of baggage after simulateUpgrade will be accepted. I think it's fine to rely on the tester to refrain from other interactions with the previous generation.

This sounds like the right framework. Questions I don't see addressed above (I might have missed something)
* inside `simulateUpgrade` can we re-call `vivifyFarClass` and all its cousins in a way that draws on the old baggage, but could attach new behavior to existing state? [I think the description above says yes, but I got lost in

Yes, when you call your initV2() function, it can either be new code with new behavior ("upgrade") or the same old code ("cross-grade"). Either one will draw from the existing state.

the library's provide() calls will observe the pre-existing state, and refrain from repeating defineKind and the other "once per vat, not once per incarnation" actions

I think the tests want to re-use kind identifiers, and associate new behavior with old baggage. I don't know if they do this by copying state into new baggage, or continuing to interact with the baggage that was previously there.

The V2 side uses a new baggage object, but that object draws from the preserved state of the V1 side. V2 code that looks up the same name from the new baggage will get new incarnations of the V1 objects (new Representatives that remember the same state), just like a real upgrade would.

Nothing must touch the V1 Representatives or baggage after the simulateUpgrade() point. In a real upgrade, those old objects would disappear, along with the entire vat, when the kernel shuts down the V1 worker. In this unit test environment, we can't prevent mis-written tests from retaining them, but we can instruct developers to not do that. Maybe there's some pattern which would make it more awkward to accidentally retain them, like:

test('name', async t => {
  await simulateUpgrade(async baggage1 => {
    const apiV1 = initV1(baggage1);
    // interact with apiV1, t.is(), etc
  });
  await simulateUpgrade(async baggage2 => {
    const apiV2 = initV2(baggage2); // note: not the same object as above
    // interact with apiV2, etc
  });
});

(so the only way to retain a V1-world object long enough to use in the V2 clause would be to stash it inside a higher-level let binding, and that might look weird enough to let test reviewers catch it)

We could also maybe wrap baggage in a non-membrane (single-level) revocable forwarder, so if you did baggage1.set() after baggage2 existed, the .set() would throw. But it would be a lot of work to build a full revocable membrane around it, so if the V1 side did e.g. sub1 = provide(baggage1, 'subname, () => makeScalarBigMapStore()), and something in the V2 side mistakenly held on to sub1 and used it, you'd get serious confusion and no friendly "hey I revoked that don't use it" error messages.

* Can test code probe the contents of various generations of baggage? I don't yet know what data structures I'd be looking for, but I'm sure I'd learn once I had the context.

Within the window/scope of the V1 baggage, sure. Once the next simulateUpgrade() call happens, you should never touch the old baggage again.

I'm okay with the choice to require test.serial every time. When we first started writing swingsetTests, the best way was one test per directory. Since upgrade tests are the granularity of integration tests, that seems fine.

Yeah, I think it's unfortunate, because these sorts of unit-ish tests will run a lot faster, so you should feel free to use a lot more of them, and that makes it more ergnomic/readable to put a lot of them in a single file. And it's not obvious why they need to be run serially (at least if they're sharing a process, and I believe AVA won't share a process between test cases unless they appear in the same test-*.js file), so the limitation feels awkward. But yeah, it's less of a restriction than the more integration-ish tests that were the only previous option.

I'm hopeful that read-only interaction with the previous generation of baggage after simulateUpgrade will be accepted. I think it's fine to rely on the tester to refrain from other interactions with the previous generation.

The problem is that the old baggage is still backed by a DB (well, really, the retained Map instance), and that "DB" is still mutable, and the new V2 code/baggage is mutating it.

I know a bunch of the feature requirements won't be obvious until we have some code to play with, but can you think of a case where you'd want to read from the old baggage after starting up the new stuff? I can imagine code in the trailing end of the V1 clause where you'd compare stuff in baggage against expectations (lots of t.is(baggage1.get('thing'), 'expectedThing')). But doing t.is(baggage2.get('thing'), baggage1.get('thing')) feels slightly weirder: you're checking that the V2 stuff hasn't modified baggage since V1 finished doing it's work. Hm, maybe that's useful.

Oh, huh.. ok, what we might consider is cloning that Map instead of re-using it, and initializing the V2 world with a pre-populated copy. In that scheme, the V1 world (including the old baggage, and in fact all the Presences/Kinds/etc created from it) would remain completely functional, but any changes you made to them would not be reflected in the V2 world. It runs a slightly greater risk of developer confusion (vs having baggage1 just throw outright, at least for shallow access), but read-only access would remain safe. It would be pretty important to not let Representatives from the two worlds come into contact (baggage2.set('key', baggage1.get('otherkey'))), that would cause deep confusion.

If we did that, and it seemed usable/not-confusing enough, then we might be comfortable removing the thunk-based isolation, and go back to:

test('name', async t => {
  const baggage1 = simulateUpgrade();
  doV1Stuff(baggage1);
  const baggage2 = simulateUpgrade();
  doV2Stuff(baggage2); // but `t.is` assertions could read from baggage1 safely
});

can you think of a case where you'd want to read from the old baggage after starting up the new stuff?

I don't have a case in mind. Give me a few days to think about how I'd use it. My instincts say I want to look, but it's quite possible that being able to t.assert() with abandon will be all I need.

I agree that t.is(baggage2.get('thing'), baggage1.get('thing')) would be weird and I don't ask for support for that. I'm more thinking that I may want to compare keys from a Map stored in baggage1 with the corresponding keys in baggage2.

I think at this point, I'm convinced enough that the version that doesn't want tests to look back at baggage1 after simulateUpgrade would be useful enough that I'd prefer that one sooner over something else later.

This is still useful and should be landing in the near future, but the kernel team have been wondering about the severity of its limitation to simulating a single vat. It's also worth noting that much of the burden involved with full-kernel tests appears to derive from file separation and bundling rather than just having a kernel (although there is some friction from value translation as well, e.g. kser/messageVatObject/etc. rather than E(vatExport).doSomething(…)), which means that an evaluation of current testing and potential followup improvements may still be in order.

Agoric / agoric-sdk