Open warner opened 2 years ago
This sounds like the right framework. Questions I don't see addressed above (I might have missed something)
simulateUpgrade
can we re-call vivifyFarClass
and all its cousins in a way that draws on the old baggage, but could attach new behavior to existing state? [I think the description above says yes, but I got lost in
the library's provide() calls will observe the pre-existing state, and refrain from repeating defineKind and the other "once per vat, not once per incarnation" actions
I think the tests want to re-use kind identifiers, and associate new behavior with old baggage. I don't know if they do this by copying state into new baggage, or continuing to interact with the baggage that was previously there.
I'm okay with the choice to require test.serial
every time. When we first started writing swingsetTests, the best way was one test per directory. Since upgrade tests are the granularity of integration tests, that seems fine.
I'm hopeful that read-only interaction with the previous generation of baggage after simulateUpgrade
will be accepted. I think it's fine to rely on the tester to refrain from other interactions with the previous generation.
This sounds like the right framework. Questions I don't see addressed above (I might have missed something)
* inside `simulateUpgrade` can we re-call `vivifyFarClass` and all its cousins in a way that draws on the old baggage, but could attach new behavior to existing state? [I think the description above says yes, but I got lost in
Yes, when you call your initV2()
function, it can either be new code with new behavior ("upgrade") or the same old code ("cross-grade"). Either one will draw from the existing state.
the library's provide() calls will observe the pre-existing state, and refrain from repeating defineKind and the other "once per vat, not once per incarnation" actions
I think the tests want to re-use kind identifiers, and associate new behavior with old baggage. I don't know if they do this by copying state into new baggage, or continuing to interact with the baggage that was previously there.
The V2 side uses a new baggage
object, but that object draws from the preserved state of the V1 side. V2 code that looks up the same name from the new baggage will get new incarnations of the V1 objects (new Representatives that remember the same state), just like a real upgrade would.
Nothing must touch the V1 Representatives or baggage
after the simulateUpgrade()
point. In a real upgrade, those old objects would disappear, along with the entire vat, when the kernel shuts down the V1 worker. In this unit test environment, we can't prevent mis-written tests from retaining them, but we can instruct developers to not do that. Maybe there's some pattern which would make it more awkward to accidentally retain them, like:
test('name', async t => {
await simulateUpgrade(async baggage1 => {
const apiV1 = initV1(baggage1);
// interact with apiV1, t.is(), etc
});
await simulateUpgrade(async baggage2 => {
const apiV2 = initV2(baggage2); // note: not the same object as above
// interact with apiV2, etc
});
});
(so the only way to retain a V1-world object long enough to use in the V2 clause would be to stash it inside a higher-level let
binding, and that might look weird enough to let test reviewers catch it)
We could also maybe wrap baggage
in a non-membrane (single-level) revocable forwarder, so if you did baggage1.set()
after baggage2
existed, the .set()
would throw. But it would be a lot of work to build a full revocable membrane around it, so if the V1 side did e.g. sub1 = provide(baggage1, 'subname, () => makeScalarBigMapStore())
, and something in the V2 side mistakenly held on to sub1
and used it, you'd get serious confusion and no friendly "hey I revoked that don't use it" error messages.
* Can test code probe the contents of various generations of baggage? I don't yet know what data structures I'd be looking for, but I'm sure I'd learn once I had the context.
Within the window/scope of the V1 baggage, sure. Once the next simulateUpgrade()
call happens, you should never touch the old baggage again.
I'm okay with the choice to require
test.serial
every time. When we first started writing swingsetTests, the best way was one test per directory. Since upgrade tests are the granularity of integration tests, that seems fine.
Yeah, I think it's unfortunate, because these sorts of unit-ish tests will run a lot faster, so you should feel free to use a lot more of them, and that makes it more ergnomic/readable to put a lot of them in a single file. And it's not obvious why they need to be run serially (at least if they're sharing a process, and I believe AVA won't share a process between test cases unless they appear in the same test-*.js
file), so the limitation feels awkward. But yeah, it's less of a restriction than the more integration-ish tests that were the only previous option.
I'm hopeful that read-only interaction with the previous generation of baggage after
simulateUpgrade
will be accepted. I think it's fine to rely on the tester to refrain from other interactions with the previous generation.
The problem is that the old baggage is still backed by a DB (well, really, the retained Map
instance), and that "DB" is still mutable, and the new V2 code/baggage is mutating it.
I know a bunch of the feature requirements won't be obvious until we have some code to play with, but can you think of a case where you'd want to read from the old baggage after starting up the new stuff? I can imagine code in the trailing end of the V1 clause where you'd compare stuff in baggage against expectations (lots of t.is(baggage1.get('thing'), 'expectedThing')
). But doing t.is(baggage2.get('thing'), baggage1.get('thing'))
feels slightly weirder: you're checking that the V2 stuff hasn't modified baggage since V1 finished doing it's work. Hm, maybe that's useful.
Oh, huh.. ok, what we might consider is cloning that Map
instead of re-using it, and initializing the V2 world with a pre-populated copy. In that scheme, the V1 world (including the old baggage, and in fact all the Presences/Kinds/etc created from it) would remain completely functional, but any changes you made to them would not be reflected in the V2 world. It runs a slightly greater risk of developer confusion (vs having baggage1
just throw outright, at least for shallow access), but read-only access would remain safe. It would be pretty important to not let Representatives from the two worlds come into contact (baggage2.set('key', baggage1.get('otherkey'))
), that would cause deep confusion.
If we did that, and it seemed usable/not-confusing enough, then we might be comfortable removing the thunk-based isolation, and go back to:
test('name', async t => {
const baggage1 = simulateUpgrade();
doV1Stuff(baggage1);
const baggage2 = simulateUpgrade();
doV2Stuff(baggage2); // but `t.is` assertions could read from baggage1 safely
});
can you think of a case where you'd want to read from the old baggage after starting up the new stuff?
I don't have a case in mind. Give me a few days to think about how I'd use it. My instincts say I want to look, but it's quite possible that being able to t.assert()
with abandon will be all I need.
I agree that t.is(baggage2.get('thing'), baggage1.get('thing'))
would be weird and I don't ask for support for that. I'm more thinking that I may want to compare keys from a Map stored in baggage1
with the corresponding keys in baggage2
.
I think at this point, I'm convinced enough that the version that doesn't want tests to look back at baggage1 after simulateUpgrade
would be useful enough that I'd prefer that one sooner over something else later.
This is still useful and should be landing in the near future, but the kernel team have been wondering about the severity of its limitation to simulating a single vat. It's also worth noting that much of the burden involved with full-kernel tests appears to derive from file separation and bundling rather than just having a kernel (although there is some friction from value translation as well, e.g. kser
/messageVatObject
/etc. rather than E(vatExport).doSomething(…)
), which means that an evaluation of current testing and potential followup improvements may still be in order.
What is the Problem Being Solved?
Currently it is a hassle to write tests of vat/library upgradability. The one scheme we have is more of an integration test, where you use a real swingset kernel and build a pair of vats (one pre- and one post- upgrade), import the library you're testing into those, have a controller/bootstrap vat drive an upgrade and a bunch of queries, log the output, and then the only real
t.deepEquals()
-style AVA assertion is to compare the output log against a "golden master".These tend to be slow, coarse, hard to run under a debugger, and do not provide useful coverage data.
It would be great if we could write more "unit-ish" unit tests: smaller, faster, exercising finer details.
Description of the Design
This kind of code is importing from
@agoric/vat-data
to get the primary virtual/durable tools:defineKind
(timesDurable
andMulti
variations)makeKindHandle
providePromiseWatcher
/watchPromise
makeScalarBigMapStore
(timesSet
andWeak
variations)canBeDurable
Those all come from
globalThis.VatData
, which is provided to real vats by the swingsetliveslots
layer (since they must all integrate tightly with the Presences and Representatives that liveslots manages).(
@agoric/vat-data
also exports secondary provide/vivify tools, likeprovideDurableMapStore
,defineVirtualFarClass
,vivifyKind
, andvivifyFarClass
, but these are all wrappers around the core functions that come fromVatData
)We have a "fake virtual stuff" utility already, to unit-test libraries that import from
@agoric/vat-data
(or code that uses such libraries). Your unit test starts withimport '@agoric/swingset-vat/tools/prepare-test-env.js'
(orprepare-test-env-ava.js
, since we always use AVA), and that use a function namedmakeFakeVirtualStuff
to populateglobalThis.VatData
with enough code to supportdefineKind
/etc.The backing store for
makeFakeVirtualStuff()
is a plainMap
, which takes the place of thesyscall.vatstoreSet(key,value)
that liveslots would use to keep data in the DB.makeFakeVirtualStuff()
then creates a VirtualObjectManager and CollectionManager around this Map, asks them fordefineKind
and friends, and populatesglobalThis.VatData
with those utilities. As long as this happens before any user-level code is imported (and thus imports@agoric/vat-data
), the utilities will be in place on the global in time to be imported by user code.My plan is:
fakeVirtualSupport.js
:Map
instead of always creating a local/private oneprepare-test-env-ava.js
:makeFakeVirtualStuff
, so the Map can be reused latersimulateUpgrade
function, which callsmakeFakeVirtualStuff()
again but with the same Map as before, to represent the DB that is retained across an upgradebaggage
object (a BigMapStore), which uses a known/stable ID within the DB/MapsimulateUpgrade
is called, it creates a newbaggage
, backed by the new VOM/etc, but because it uses the same ID, it will return equivalent contentsVatData
so they can be replaced bysimulateUpgrade()
VatData.defineKind = (...args) => currentVOM.defineKind(...args)
, etc@agoric/vat-data
re-exports the pieces ofglobalThis.VatData
, and downstream code imports those pieces fromvat-data
, so their identities are immutableA unit test that wants to exercise an upgrade would import the code-under-test and
prepare-test-env-ava.js
as usual, then in the test case:baggage1 = simulateUpgrade()
baggage1
) which will dodefineKind
/provide/etc, and return someapiV1
object to interact withapiV1
to build up some durable statebaggage2 = simulateUpgrade()
baggage2
), which will re-do the define/provides/etc, and return a newapiV2
objectprovide()
calls will observe the pre-existing state, and refrain from repeatingdefineKind
and the other "once per vat, not once per incarnation" actionsapiV2
to confirm that the durable state is still presentI think this approach will work, but it has some significant limitations, most of which derive from our (justified) choice to expose these stateful utilities (
defineKind
/makeScalarBigMapStore
/etc) as imports, rather than passing them into library code as arguments:simulateUpgrade
, since they all reference the same global statetest.serial
on every testtest.serial
will cause hard-to-debug random failures, or worseapiV1
again after thesimulateUpgrade()
, because that version will be accessing stale dataprepare-test-env.js
wrappers to stop working aftersimulateUpgrade()
, but I think that would be tricky: instead of only wrappingdefineKind
/etc, we'd need to wrap the values they return, i.e. a full membrane instead of a shallow single-level wrapperSecurity Considerations
None, this is only to support unit tests
Test Plan
Write an example test in
packages/SwingSet/
, advertise it to authors of other libraries that use virtual/durable data.