Open warner opened 1 year ago
Perhaps we should make the bundle files for this package during the build process (and publish them to NPM), so their contents are well-specified. Every yarn build
is expected to create exactly the same contents. Endo's bundle-source
includes version numbers in the generated bundle (as part of the component map), so the following pieces must be stable:
bundle-source
behavior)Having a yarn.lock should enforce that. There will be a very limited set of changes that could possibly be made (between the original worker-v1@1
and a subsequent worker-v1@2
release) that can still maintain this property, and none of them can affect the behavior of the generated bundle. E.g. we might find a new Endo version of bundle-source
that bundles twice as fast as before: we can use that, as long as the bundles it creates are identical to the version used by @1
. But a new bundle-source
that creates bundles half the size as @1
could not be used.
We cannot rely on yarn.lock
to pin exact versions of the transitive dependencies of an *-v*
package in a workspace where more than one version shares a yarn.lock
, since yarn
is free to solve the dependency graph in a way that satisfies the most recent version. We either need these versioned worker packages to contain a checked-in bundle and exact xsnap binary, or we need each of these versions to have an independent yarn.lock
. We should jam on this.
(That is, if yarn.lock
is free to change after worker-v1
gets captured, worker-v1
behavior will change.)
(hm, ok that may or may not torpedo the following plan)
I've spent more time thinking about this. The constraints are:
agoric-sdk:packages/SwingSet
, with NPM name @agoric/swingset-vat
) must be able to simultaneously import multiple versions of an "xs-worker" package, each of which may have different subsets of XS/xsnap.c/lockdown/supervisor/liveslots behaviors
@agoric/swingset-worker-xs-v1
, and a second package with NPM name @agoric/swingset-worker-xs-v2
swingset-worker-xs-v1
dependency from version 1.0.0 to 1.1.0 to 1.2.0, for various reasons, with full knowledge of the compatibility(vs bug-fixing) consequences thereofswingset-worker-xs-v1
must consistently use the exact same XS library, xsnap.c application, lockdown bundle, and supervisor (+liveslots) bundleswingset-worker-xs-v1
should be able to change one of these things without changing the other
swingset-worker-xs-v1 @ 1.2.0
might use the same bundles as @1.1.0
, but use an XS library with some bugfixes or performance improvements1.2.0
string, like maybe 1.1.0 -> 1.1.1
might change performance but not metering/syscall/GC behavior, whereas 1.1.0 -> 1.2.0
might allow metering/syscall/GC behavior but retains heap-snapshot compatibility. maybe.We have two bundles used to start an xsnap worker. The first is named "the lockdown bundle", and currently the entry point is SwingSet/src/supervisors/subprocess-xsnap/lockdown-subprocess-xsnap.js
, which is a one-line library that imports @agoric/xsnap/lib/ses-boot.js
, which is a 5-line library that does:
import { setObjectInspector } from './console-shim.js';
import '@endo/init';
import confinedObjectInspect from './confined-object-inspect.js';
setObjectInspector(confinedObjectInspect);
harden(console);
The second is named "the supervisor bundle", and the entry point is supervisor-subprocess-xsnap.js
, which imports liveslots, creates an instance of liveslots, and prepares the worker to receive netstring commands.
Our bundles include the package name, package version, and relative pathnames of every file in the bundle, including the entry point. To generate identical bundles, it is not sufficient for the entrypoint file to have the same contents: it must also live in the same package, of the same package version, and at the same relative path.
That means swingset-worker-xs-v1
cannot hold the entrypoint file, since we want two separate versions of this package to be capable of using the exact same bundles (controlled differences). As @kriskowal pointed out, this really means the bundles should be generated as part of a build process, and published to NPM (with of course some simple process to regenerate them locally to compare against the published versions).
So that makes me think the following plan could work:
The light blue boxes represent packages, at various versions. The blue connecting lines are cross-package dependencies. The white boxes are JS files or JS-level exports of the package, and the black connecting lines are import
statements.
The @agoric/swingset-kernel
package (originally swingset-vat
) is the anchor. The first version we publish only supports "worker-v1", but a later version could support both -v1 and -v2.
The "worker-v1" support is provided by a package named @agoric/swingset-worker-xs-v1
. This depends upon three other packages.
The first is xsnap
. This lives in a package named @agoric/xsnap
at some specific version, which provides both the xsnap
binary and the JS libraries which talk to it.
git submodule
to reference two other repositories: one for the XS library (currently our fork in agoric-labs/moddable
), and a second for the xsnap.c
code (also a fork, in agoric-labs/xsnap-pub
).xsnap-worker
executable, at a well-known location within the package directoryThe second is a package named @agoric/xsnap-lockdown
. This provides the lockdown bundle. In particular, it contains ses-boot.js
, which imports Endo, and configures the console (to accomodate the XS print
function that provides the sole access to stdout). The lockdown package depends upon some particular version of Endo, by virtue of a package.json
dependency on the @endo/init
package.
The lockdown package exports two API functions. The first is getBundleURL()
, which returns a URL that points to a pre-generated bundle file. (maybe we make it async
for future flexibility). The second is a getVersions()
that returns an object with at least { version, bundleSHA }
strings. The bundle would be generated by calling bundleSource()
on an entry-point file that lives inside the lockdown package, at npm build
time, so the bundle would be included in the tarball that gets uploaded to NPM. At the same npm build
time, we hash the bundle and record it in a generated file that is consulted by getVersions()
. Clients should, at the very least, record the expected bundleSHA
in their own source code, and compare it against the retrieved value, at each application startup. More cautious clients should also independently hash the file at getBundleURL()
and assert that it matches their expectation. These are to prevent compatibility/determinism surprises in case somethng in the lockdown package (or further downstream) gets updated somehow, like if yarn
computes a different dependency solution. When a client decides to switch to a new version of the lockdown package, they should update their expectations accordingly.
Alternatively, we might choose to have getBundleURL()
perform the bundling process in that moment, writing the contents to some cache directory. However the hash should remain the same. Using this approach would make it more obvious that the @agoric/xsnap-lockdown
version also depends upon the @endo/bundle-source
version used to make the bundle, however it is common that a newer bundle-source
could still produce the same bundles, whereas it is exceedingly unlikely that bundling a newer @endo/init
package could do the same.
The third package is the "supervisor package", which will live in @agoric/swingset-xs-supervisor
at some particular version. This uses the same build/bundling process as the lockdown bundle, but its entrypoint is a supervisor.js
, which has considerably more code, and is quite swingset-specific. This supervisor will import code from the @agoric/swingset-liveslots
package (#6863).
The @agoric/swingset-worker-xs-v1
package exports a makeStartXSnap()
function. This creates a new xsnap
worker and prepares it for vat deliveries. If the options
bundle provides a heap snapshot to start from, the new worker is loaded from that snapshot. If not, the new worker is empty, and makeStartXSnap()
instructs it to evaluate the lockdown and supervisor bundles.
When considering moving from one version of XS to another, we must think about how similar the new version behaves, compared to state or actions created under the previous version. There are two things for the new behavior to be compatible with:
A failure of new-delivery consistency would cause a validator to fall out of consensus. A failure of replay consistency might be caught by the individual validator as it noticed a discrepancy between the transcript-recorded syscalls (or delivery results) and the replay.
Now we can classify potential changes in XS behavior into 4-ish categories:
xsnap
worker, which would panic the kernel, which means the resulting state doesn't get recorded, so a new version is allowed to diverge by surviving the same operationagoric-upgrade-8
)The compatibility level of XS determines how we might deploy it to the chain.
Since A-level changes to XS do not cause observable behavior differences, we should be able to deploy them at any time. We would publish a new version of @agoric/xsnap
with a micro version number bump (e.g. 1.0.0
to 1.0.1
), which points to a different git-submodule commit hash of agoric-labs/moddable
(for XS) or agoric-labs/xsnap-pub
( for xsnap itself). Then we'd publish an @agoric/swingset-worker-xs-v1
package with a micro version bump, and we'd change @agoric/swingset-kernel
to reference the new worker package. @agoric/swingset-worker-xs-v1
would continue to reference the same lockdown and supervisor packages as before.
Validators could switch to the new kernel at their leisure, without requiring a coordinated chain upgrade event.
If the XS changes are classified as "B-level", then the upgrade process requires conformance testing (and a little bit of luck, given that it might take a week from the time testing must stop to the time a chain-upgrade vote can complete). We'd produce the same packages as above, but we'd bump the minor version (e.g. 1.0.0
to 1.1.0
), to indicate the lower level of compatibility. All validators would need to move to the new @agoric/swingset-kernel
version at the same time.
(note: maybe the minor version isn't the right component to change; we might need four levels of version number to capture all of this).
For C-level changes, the kernel will need to run both old and new versions for some period of time (since not all vats will reach their BOYD+heap-snapshot point at the same moment, and that moment certainly doesn't happen at the moment of chain upgrade). Therefore we must allocate a new @agoric/swingset-worker-xs-vNNN
number (e.g. adding a package named @agoric/swingset-worker-xs-v2
). This new package would depend upon a new xsnap
version that can have a major(??) version bump
The new kernel version would depend upon both -v1
and -v2
. The kernel would know (somehow? maybe the getVersions()
return values??) that this upgrade must be perform at a heap-snapshot boundary, and would schedule a change from -v1
to -v2
at that point.
For D-level changes, we definitely need separate -v1
and -v2
packages. The kernel would wait until the next baggage-style vat-upgrade to swap in the new version. Only a change at this level is allowed to change the lockdown (SES/Endo) or liveslots bundles.
It seems likely that the kernel might not automatically upgrade vats across this boundary. Instead, the kernel upgrade changes the default "what worker should new vats use?" value, which can also be the value used when upgrading an existing vat. Then, userspace (e.g. contract governance committee action) causes a vat upgrade for each vat, and the bundles change during that upgrade.
We create a new version of @agoric/swingset-lvieslots
or Endo, then we make a new version of @agoric/swingset-xs-supervisor
or @agoric/xsnap-lockdown
. Then we make a new package like @agoric/swingset-worker-xs-v2
, just like in the "D-level" change above. The kernel depends upon both -v1 and -v2, and a userspace-triggered upgrade takes advantage of the new versions.
So here's a plan of attack:
makeStartXSnap
from SwingSet/src/controller/controller.js
to SwingSet/src/make-xs-worker/
supervisors/subprocess-xsnap
there toosupervisor-helper.js
in there too
src/make-xs-worker/
is basically everything needed for our new packagesrc/make-xs-worker/
into a new packages/swingset-xs-worker-v1/
packages/swingset-xs-lockdown
, move lockdown-subprocess-xsnap.js
to it
build
script thta makes a bundle filegetBundleURL()
and getVersions()
swingset-xs-worker-v1
to read the bundle during worker init, and compare hashespackages/swingset-xs-supervisor
, move supervisor-subprocess-xsnap.js
and supervisor-helper.js
to it
- make a copy of
supervisor-helper.js
in there too
It would be great if during this process every file ended up as a git move so we can keep history somewhat traceable.
- make a copy of
supervisor-helper.js
in there tooIt would be great if during this process every file ended up as a git move so we can keep history somewhat traceable.
Agreed, although I don't know how to make copies trigger git's heuristic. OTOH, if you're following it with git log --follow
, you'll probably see it "created from nothing" at the commit that actually copied it, and I'll make sure the comment on that commit cites the place it came from, so at least the human running git log
will know what second command to run to follow it further back.
discussing this with @kriskowal , here are some notes:
dependencies
, but only devDependencies
xsnap
will have more runtime dependencies, as it needs code to implement the command pipepackage.json
syntax to allow named aliases for specific versions of dependencies
worker@v1
and worker@v2
at the same time, under different names-LATEST
development as well as support for distinctly-published v1
/v2
/etc
-LATEST
, as it does not refer to a stable behaviorv1
and -LATEST
, assuming userspace runs under v1
lerna publish
works, to keep it from gratuitously version-bumping the bundle packages, especially if they lack non-dev dependenciesyarn.lock
, nailing down the dependencies below
-LATEST
We agreed that rearranging the packages as drawn above (except stick with a single @agoric/swingset-worker-xs
for now, rather than distinctly-named variants) is a good starting point, as it is work that needs to be done in any of our sketched-out plans. So I'll start there.
I'm thinking about how the repos and packages and workspaces ought to work, and how a "test against current versions of everything" (#7056) should work.
Tools at our disposal:
yarn.lock
is used, ignoring any per-package yarn.lock
node_modules/
will contain symlinks to local package directories(package.json).version
of the local package satisfies the constraintpackages/depender/node_modules/dependent@othervesrion
will be populated, from NPMworkspace:
dependencies into the just-published versions at packaging/publish timedependencies
section can contain aliases, basically "ALIAS": "PACKAGE@VERSION"
"worker-xsnap-v1": "@agoric/swingset-worker-xsnap@1.0.0"
and "worker-xsnap-v2": "@agoric/swingset-worker-xsnap@2.0.0"
at the same time, and maybe we could avoid the need for distinctly named worker packagesyarn link
to create a symlink from node_modules/XYZ
(in one git checkout, e.g. agoric-sdk
) to a nearby but separate git checkout (e.g. swingset-liveslots
)
git submodule
lets you place one entire git repo in a subdirectory of another
packages/xsnap/
subdirectory currently uses this to grab both the Moddable XS library code, and the xsnap
executable's C code.gitmodules
file is added, which tells git which subdirectories should be populated, and which repository URLs to useThe new package dependency graph is:
graph TD;
swingset-kernel --> swingset-worker-xsnap-v1;
swingset-worker-xsnap-v1 --> xsnap;
swingset-worker-xsnap-v1 --> xsnap-lockdown;
xsnap-lockdown --> endo/init;
swingset-worker-xsnap-v1 --> swingset-xsnap-supervisor;
swingset-xsnap-supervisor --> swingset-liveslots;
and the goal is to allow swingset-kernel
to depend on multiple versions of swingset-worker-xsnap-vNN
at the same time, each of which will need very specific versions of its dependencies.
One potential approach, with obvious drawbacks:
git submodules
of moddable/xs
and xsnap-pub
)yarn.lock
fileswingset-xsnap-supervisor
and xsnap-lockdown
from standalone checkouts
yarn.lock
will be honored because there is no surrounding workspace to override itswingset-worker-xsnap
has a newly published version, and then decide whether the kernel will change from e.g. 1.0 to 1.1 (making the claim that existing vats do not need to be upgraded), or if it will add a new worker type like -v2
(leaving existing vats alone until the application chooses to upgrade them)This would achieve our stability goals, but development would be horrible: every commit would have to go through NPM before the parent package could use it. We'd be publishing broken versions because we couldn't actually test the combination of new-child + new-parent before committing to the new-child patch.
We might create a separate repo for use by kernel developers. It would define a yarn-v1 workspace
at the top, and use git submodules to link in the five child packages so they'd all sit next to each other inside the workspace. If we made the cross-package dependency version constraints more lenient, then yarn install
would satisfy them with symlinks. That would let us make matching changes in both child and parent package without going through NPM for each commit.
One problem is that git submodule
leaves each embedded git repo checked out on a "detached HEAD": it records the commit ID (hash), but not a branch name. So when a developer commits a change in one repo, it isn't associated with any branch (like master
or a feature branch to submit in a PR). We'd have to go through some ugly hoops: after the initial git submodule update
clones everything and switches to the recorded commit ID, we'd need a script to change the submodules to a named branch. We'd also need it to change the git remote
specification to allow developers to push to GitHub from the submodule (normally the .gitmodule
file refers to a read-only HTTPS URL).
The non-kernel agoric-sdk code would continue to use a kernel from NPM, making a boundary between swingset and its clients that would have to go through NPM for each change (same problem as above, but at a different boundary, hopefully one which changes more slowly). Devs who want to test both sides could use yarn link
to associate the two repos.
Another approach would be to use yarn link
to associate multiple git repos within a single developer's working environment:
git clone
agoric-sdkgit clone
swingset-liveslots, repeat for the other swingset reposyarn link
, to register this repo as providing the packageyarn link
, with no arguments, inside package X, basically creates a registry in ~/.config/yarn/link/
that says "if anything else runs yarn link X
, make their node_modules/X
be a symlink to this repo")yarn link PACKAGENAME
for all of the child packages which this one depends uponyarn link @agoric/swingset-vat
to allow sdk packages to use the kernel from the nearby repoDepending on the versions and constraints, this would replace e.g. swingset-xsnap-supersivor
's node_modules/@agoric/swingset-liveslots/
subdirectory (as installed from NPM) with a symlink to the nearby repo. We could the make changes in liveslots, cd
to swingset-xsnap-supervisor
, re-run yarn build
to get a new bundle, etc.
When it comes time to publish a version to NPM, we'd use yarn unlink PACKAGENAME
to remove all the symlinks, then do npm publish of each package, from the bottom of the graph and upwards.
In this approach, we continue to use a single monorepo, but we use a release process that temporarily changes the definition of the workspace.
packages/swingset/swingset-kernel
, packages/swingset/swingset-worker-xsnap
, etc(package.json).workspaces
glob list to include packages/swingset/*
*
("accept any version")1.N.dev0
suffix, which sorts larger than 1.N
1.N
to not be satisfied by the local 1.N.dev0
worker-xsnap-dev
-> @agoric/swingset-worker-xsnap@*
worker-xsnap-v1
-> @agoric/swingset-worker-xsnap@1.0
Development can continue in the tree, and running tests with WORKERTYPE=dev
(or however we spell the environment variable that tells the kernel what managerType
/workerType
to use by default) will get the local versions (assuming you remembered to yarn build
to update the bundles). Running tests without the environment variable wil get worker-xsnap-v1
(or whatever the latest published version is).
When it comes time to release the swingset packages, the sequence will look like:
package.json
to remove packages/swingset/*
from the workspaces:
keynode_modules
directories, so they don't interfereyarn.lock
fileyarn install
to regenerate it, and to populate node_modules (from NPM, not the workspace, since we're no longer in a workspace)package.json
yarn install
again, nothing should changeyarn.lock
contentsyarn build
, to generate the bundles and their hashespackage.json
version from 1.N.dev0
to 1.(N+1)
package.json
changesyarn publish
, publish the generated+stable bundle to NPMpackage.json
vesrion to 1.(N+1).dev0
*
package.json
Note that all of the swingset package release tags will point to commits on the side branch, and each will point to a distinct commit.
We must also change the agoric-sdk release/publish process:
workspace:
)The swingset packages will spend most of their time at .dev0
versions. In particular, the swingset-worker-xsnap
package will be at .dev0
, which means the kernel package's worker-xsnap-v1
alias dependency will not be satisfied by the version in the workspace, and can only be satisfied by the NPM-published version (with its stable hashed pre-bundled contents).
We're currently using yarn-v1, and support for workspaces has improved significantly in the later versions (currently yarn-v3.5.0). One improvement is the ability to use nested workspaces, called "worktrees". We could use this to nest a set of swingset packages in one workspace, under the general agoric-sdk workspace.
Another improvement is a dependency constraint syntax that says "only use something from my same workspace", spelled like workspace:*
. This would let e.g. swingset-worker-xsnap
to depend on "swingset-supervisor-xsnap": "workspace:*"
, so the local development would strictly use code from within the tree. In addition, when publishing packages, yarn can dynamically replace those constraints with versions being exported by the dependencies. This would remove the need to manually change package.json files back and forth during the release process.
I'm still trying to figure out a good plan here.
Oh, before I forget, any change which creates new repos or workspaces will need to consider the patches/
directory, which (in conjunction with patch-package
) we use to modify some of our dependency packages before use. We patch a lot of Endo packages, and those will be incorporated into the xsnap-lockdown
bundle.
I haven't completely absorbed this, but a couple of somewhat scattered thoughts which possibly don't necessarily reflect the strongest possible understanding of the yarn/npm/git/blahblah toolchain:
I'd prefer whatever we do to avoid git submodules. In my experience they cause nothing but grief.
If we are clear on major vs. minor versioning, we could have V1
, V2
, Vn
(i.e., major version) worktrees, which are maintained in a relatively stable state (i.e., no significant development goes on there but bug fixes could change them). We also have a Vdevel
worktree which is where we do most of the dev work and where we spend most of our time, and our day-to-day development activity would work more or less the way it does now -- in the committed master branch Vdevel
code is expected to be kept in a non-broken state (i.e., exactly as we do now), but even though the code is functional and as correct as possible at the time, it's not published to npm or released to the world (aside from being in a git repo where somebody can grab it if that's what floats their boat). When we do a minor version release (presumably updating the most recent major version) we have a process that merges changes into the Vwhatever
worktree, generates packages, pushes to npm, etc. This process should be automated but might require a little hand coaxing from time to time, much as git branch merges do now. When we do a major version release we create a new Vnewest
worktree, copy the Vdevel
stuff into it wholesale, then proceed as we do with a minor release. This process might require a fair bit of manual activity, but presumably would be rare enough that investing in significant automation just for this is probably not a win. All this is essentially replicating the familiar git work-branch vs. master-branch pattern but at one higher meta level.
This is the first time I'm actually thinking about this problem, and I may be missing something, but I don't understand the direction taken right now, or the process taken to get there. In particular, a single "worker-v1" package amalgaming xsnap, endo, liveslots, and various dependencies.
I'm having an easier time thinking of it this problem in terms of API surface and upgradable units, and starting from the program itself. I also want to think about this as what the right approach conceptually should be, but it may not be feasible right now without other work throughout our stack.
If we look at a JavaScript program executing in a vat, it has dependencies on the environment created by liveslots and endo, and possibly a dependency on a certain set of JS features. The program may also have dependencies on remote objects or devices, which are not part of the environment, but may be part of the initial execution of the program (currently defined as buildRootObject
for a liveslots program). What it doesn't have is a direct dependency on xsnap, the kernel itself, or cosmic-swingset.
While these are dependencies in the conceptual sense, from the perspective of the program's package.json
, all these are part of the execution environment, and not part of the package's "dependencies"
. There are 2 other package.json
features that fit better to express these dependencies: "peerDependencies"
and "engines"
. "peerDependencies"
is used for packages that are relied on by the program (possibly imported), but which are expected to be installed / provided by the surrounding environment. "engines"
roughly corresponds to the set of features available ambiently (either global, but also "built-in" packages).
One approach would be to have an "endo"
engine version which describes the set of hardened JS features available in the locked down execution environment (as a combination of JS features implemented by the XS engine, and the modifications brought by @endo/init
). To express how the program is executed, we'd need to use a second "engines"
field indicating this program is meant to be invoked through "liveslots"
's buildRootObject
. These "engines"
dependencies would not result in code being included in the bundle of the program. However we'd need these "engines dependencies" to be documented by the bundle so they can be at least verified for compatibility by the kernel. I believe this information is currently lost when the bundle is created.
But looking at liveslots programs is only part of the picture. Most of the programs running in a vat are actually Zoe contracts, which rely on a specific version of the Zoe Contract Facet to invoke it (through an exported start
or prepare
) with a ZCF object of the right shape passed as argument. In that sense, the "engines"
field of these programs should actually be describing ZCF, not liveslots. However Zoe contracts still likely rely on features provided by "liveslots", and an "endo" flavored hardened JS execution environment. This raises the question of how these "external" dependencies should be expressed, and which part of the system is responsible to enforce them.
Let's take an example. Let's say ZCF itself relies on the basic "liveslots@^1.0" feature set and documents it as such in its "engines"
. However we introduce a new liveslots feature in a backwards compatible way (e.g. new global), and bump the liveslots version to 1.1.0
. The kernel/liveslots deals with executing ZCF, and could setup or verify the version of liveslots compatible with ^1.0
. However the kernel/liveslots doesn't know anything about the contract or its requirements. ZCF would need to be the one responsible to check that the current version of liveslots is compatible with the contract's requirements, however ZCF itself cannot setup the environment, as it's already executing in it, only the vat creator has that opportunity.
There is also the question of how the Zoe contract expresses its reliance on liveslots@^1.1.0
. Since it's not executed directly by liveslots, the "engines"
field may not be appropriate anymore, and "peerDependencies"
may be a better fit. However it seems that endo's bundler may currently attempt to bundle peerDependencies the same as regular dependencies (see https://github.com/endojs/endo/pull/1355).
On the other side, we have the supervisor bundle. This currently evaluates as a script, but it may not stay like that (for example it could transition to an endo worker which I believe expects a main
export). It relies on some JS features and a hardened JS environment with Compartment
for example, as well as some globals to interact with the runtime and kernel (e.g. issueCommand
and metering for xsnap workers). It also contains and sets up "liveslots" which will be used by the program.
For an xsnap supervisor, an "xsnap"
entry for the "engines"
would make sense to document the ambient APIs required. Similarly, if we introduce a sqlite directly in the worker process, having a way to document this external dependency would be needed. However what may not make sense as an "engines" requirement is the protocol version that liveslots speaks with the kernel over issueCommand
/handleCommand
, as this is something that fits the definition of "peerDependencies"
much closer. Regardless, this "protocol dependency" is a requirement that liveslots and thus the supervisor bundle would need to express, and it needs to be fulfilled by the kernel when creating the vat. At some point we'll likely need the kernel to support multiple versions of this liveslots protocol. However this protocol version may be incremented independently of the liveslots version itself as required by the vat.
For example, we may move vatstore inside the worker, which would likely be a breaking change in the liveslots protocol between the kernel and the supervisor, and well as possibly in the protocol between the kernel and the worker process. However this may be transparent on the vat program itself, and only necessitate a minor version update of "liveslots" from its point of view.
This is the final and lowest level component in a vat, the one providing a hardened JS execution environment for the supervisor bundle, as well as the low level transport with the kernel (some of which is exposed to the supervisor, like the commands interface, and some which is only exposed to the kernel, like snapshots). As such it may provide 2 "engines" versions to the supervisor (e.g. "endo" + a worker specific one like "xsnap"), but it also needs some independent versioning with the kernel.
For example, a breaking change in an xsnap worker would be to change the snapshot format, but that would have no bearing on the supervisor or vat bundles executing in the vat (beyond the fact that changing the xsnap worker version requires going through the "upgrade" process for the vat program since we cannot restore the program's heap).
On the other side, a change to the metering API may be a breaking change between the xsnap worker and the supervisor, but may not impact the transport between the xsnap-worker process and the kernel.
All this suggests that when creating or upgrading a vat, the "endo" and "liveslots" versions should be specified explicitly or derived from the bundle. However the xsnap or liveslot protocol versions are in fact transitive dependencies that do not belong expressed or checked at that layer. The major version of the worker would not be part of the options either: a null upgrade would implicitly use the latest major version of the worker compatible with the required "endo" and "liveslots" versions, and it then becomes a question of who triggers the null upgrade.
Swingset would need to handle multiple supervisor bundles and worker major versions. However these 2 don't have to be merged together (but can be to simplify things). More importantly, their versions is not directly equivalent to the "endo" and "liveslots" versions that the vat bundle requires.
This points to the need for the following mechanisms:
For example we could have the following layering:
xsnap-worker@1.1.3
: requires xsnap-worker-protocol@^1.2.0
, provides endo@2.0.0
, metering@1.0.2
, commands-api@1.3.4
xsnap-worker@1.1.4
: requires xsnap-worker-protocol@^1.3.0
(supports streaming snapshot), provides endo@2.0.0
, metering@1.0.2
, commands-api@1.3.4
xsnap-worker@2.0.0
: requires xsnap-worker-protocol@^1.3.0
, provides endo@3.0.0
(native Compartments), metering@1.0.2
, commands-api@1.3.4
xsnap-supervisor@1.4.4
: requires endo@2
, metering@^1.0.0
, commands-api@^1.0.0
, liveslots-protocol@~1.2.0
, provides liveslots-api@1.1.6
xsnap-supervisor@1.4.5
: requires endo@2 || endo@3
(checked compatibility), metering@^1.0.0
, commands-api@^1.0.0
, liveslots-protocol@~1.2.0
, provides liveslots-api@1.1.6
zcf-bundle@5.2.1
: requires endo@>=2
, liveslots-api@^1.0.0
, provides zcf-api@5.2.0
endo@>=1
, liveslots-api@^1.1.2
, zcf-api@^5.0.0
At first, only the version 1.1.3
of xsnap-worker
exists.
zcf-bundle@5.2.1
and requests the creation of a vat with it, with requirements endo@>=1
and liveslots-api@^1.1.2
.endo@>=2
and liveslots-api@^1.1.2
.xsnap-worker@1.1.3
and xsnap-supervisor@1.4.4
, verifies that it matches the vat requirements (including that the kernel is implementing the correct liveslots-protocol), creates the new vat, executes the supervisor, and the zcf bundle. In a chain upgrade, version xsnap-worker@1.1.4
is introduced. The version is fully compatible, and is automatically used to restart existing workers
In another chain upgrade, xsnap-worker@2.0.0
is introduced, which switches to native compartments. Given the change, this is a breaking for both snapshot content and the lockdown bundle. A new version of xsnap-supervisor
is introduced which only bumps the endo dependency version after verifying compatibility. A null upgrade of the vats is required. A new vat not compatible with the new endo version would still use the old worker, but new or upgraded vats which are compatible can switch.
xsnap-worker@2.0.0
and xsnap-supervisor@1.4.5
, verifies that it matches the vat requirements, and starts a vat null upgrade, which executes the supervisor, and the existing zcf bundle.This requires the kernel to be aware of multiple versions of xsnap-worker / supervisor. Right now those cannot be introduced dynamically, so we can rely on source code changes in the kernel every time the xsnap-worker or supervisor changes.
One approach could be to always rely on NPM installed versions for the non latest package, and some dependency aliasing so that the kernel can depend on multiple versions of these packages. This does make it complicated to do patch or minor updates to the non-latest version, and does not prevent the supervisor from changing due to repo effects, unless the supervisor is published as a bundle.
However since we've disconnected the xsnap package and supervisor versions from the endo and liveslots versions they provide, I'm wondering if we actually need actual publishable node packages for these.
On the xsnap side, we really have a binary and a JS spawner. The binary is mostly in an external repo already, and we could keep the above scheme, aka install a binary version published by an xsnap repo which just builds xsnap-pub with the right moddable SDK. For "latest" work, we could keep a way to build the binary from a submodule instead. Then on the JS spawner side, we could have separate non published packages or even more simply source files that correspond to the various "xsnap-worker" versions, which would internally reference the required binary. These JS spawners could share some source code where applicable.
Similarly, for the supervisor, if we're careful to not have dependencies influenced by the rest of the repo, we could keep the different versions as sibling folders and files in the repo. To ensure we don't break our requirements of no changes to previous versions, we could either check-in the resulting bundle, or a hash of the expected bundle, for each version.
I had a good brainstorming session with @michaelfig and @mhofman this morning, and I think we have a better plan than before:
swingset-xsnap-supervisor
(#7063), but abandon swingset-worker-xsnap-v1
(#7101)bundleStore
to accept nestedEvaluate
bundles too (with an ID that's probably the hash of the JSON encoding of the whole bundle object)bundleStore
under their hash IDs, and add a pair of kvStore
keys to remind the kernel of what the "current lockdown/supervisor bundle ID" areoptions
)bundleStore
and feed them to the new worker, instead of using fixed copiesThat will get us stability of the two bundles for the lifetime of a vat incarnation. The stability is low-level (precise bundle contents, as opposed to a version string), which is good. We won't directly have a semver/human-meaningful name for that behavior: we'll talk about "the liveslots that shipped with agoric-sdk@1234
" instead of "liveslots@456
", but that's probably ok.
When we perform the agoric-sdk release process, we'll get a new semver version string for each bundle-creating package, and we'll create (and hash) a bundle as well. We won't learn the bundle's hash until that moment. The bundles ought to be reproducible: clone the SDK at that commit, that will include a particular yarn.lock
, running yarn install
ought to get all the same packages/versions into node_modules
, then yarn build
should be generating a bundle from all the same components.
Updating something in the agoric-sdk (e.g. with yarn add
, modifying the yarn.lock
) may or may not change what goes into the bundles. We believe that yarn
's dependency resolver tries hard to retain the original versions: consider a package that lives in the transitive dependency set of both xsnap-lockdown
and e.g. zoe
, and then zoe
is changed to require a newer version (which would still be accepted by the xsnap-lockdown
constraint), we think yarn will include both versions rather than de-duplicate them (by increasing the version that xsnap-lockdown
gets to one which means both packages' constraints). However we may want to deduplicate, just to clean things up, and that certainly would change the files that go into the bundle. We should be able to inspect the bundle hash (printed to stdout during yarn build
) to tell whether something changed or not.
TODO: make sure the generated bundles are included in the NPM package (run yarn pack
and inspect the tarball), change the package.json files:
section if necessary.
After the vaults release, we'll need to decide how we get the second version of the bundles into place. One option is a controller.updateBundles()
call, which would:
xsnap-lockdown
and swingset-xsnap-supervisor
) for their current hash
kernelKeeper. get current bundle IDs
(however it's spelled), then:Other options/variants:
controller.updatebundles()
on every kernel launch
Note: a top-level yarn build
will build each package in alphabetical order. We don't have bundles that include other bundles, but if we ever did, we'd need to make sure the leaves got build before the higher nodes of the dependency/inclusion graph, so e.g. xsnap-lockdown
and swingset-xsnap-supervisor
need to be built early.
For now, we won't record any per-vat metadata to describe which version of xsnap
(and thus which version of XS
) is used. All workers will use the same xsnap
, regardless of whether they were created weeks ago or just now.
This will constrain our ability to change XS/xsnap: any change we make must be sufficiently compatible to not cause problems when replaying a transcript (the old XS and the new XS must match transcripts) (this may be easier once #7142 is fixed and we're more confident that liveslots ignores GC timing variations).
We'll pretend that there's a per-vat metadata key that says which version of xsnap to use, and that it's currently set to "0" (i.e. lack of a key means "0"). Later, when we want to introduce a not-so-compatible version of XS/xsnap, we'll start using this key. At that point, we'll either need separately named xsnap
packages (@agoric/xsnap-v1
vs @agoric/xsnap-v2
), or we'll use the package.json > dependencies:
aliasing feature to let the kernel code import multiple package name simultaneously, even though the package.json
maps those onto different versions of the same @agoric/xsnap
package name.
That is the point at which we'll figure out the semver attributes of these different xsnap packages, what "minor version has changed" means w.r.t. compatibility, etc. We can avoid committing ourselves to some of those details until we need the second version/package.
At that time, we'll update the kernel to record "current xsnap version identifier" into the DB, and use it when launching the worker. My original plan was to change the managerType
into a worker
field, giving it values of local
and xsnap-v1
, and in that approach, this is the point where we'd introduce xsnap-v2
(which would encompass all of XS, xsnap, lockdown, supervisor/liveslots).
In our new thinking, xsnap-v2
could refer to just the xsnap binary and the embedded XS library, letting you control both lockdown and supervisor as independent knobs.
Currently, package/xsnap/
hosts two git submodules, one named moddable
(pointing at https://github.com/agoric-labs/moddable.git
, our fork of the XS SDK), and a second named xsnap-native
(pointing at https://github.com/agoric-labs/xsnap-pub
, our fork of Moddable's xsnap
repo). We're thinking it would be a good idea to make xsnap-pub
host the moddable
git-submodule, rather than packages/xsnap/
(changing the git-submodule graph from (A->B, A->C)
to (A->B->C)
). packages/xsnap/
knows how to build the xsnap
binary, and knows the path to that binary, and has the API code which spawns a child process from that binary, plus the API code which speaks the netstring pipe to it.
Then, the git commit ID of the xsnap-pub
submodule can serve has a name for the xsnap+XS behavior, and the commit ID of the lower-layer moddable
submodule is a narrower name for the XS behavior, either of which might wind up in the per-vat metadata some day.
- when the kernel is initialized, we'll sample the lockdown and supervisor bundles, store them in the
bundleStore
under their hash IDs, and add a pair ofkvStore
keys to remind the kernel of what the "current lockdown/supervisor bundle ID" are
I believe @michaelfig suggested we could do this lazily and just load the bundle and hash from disk when needed by a new vat.
a top-level
yarn build
will build each package in alphabetical order
Is that true for all packages or only the ones with circular dependencies? I thought yarn correctly handled the order for packages that have no circular deps.
- when the kernel is initialized, we'll sample the lockdown and supervisor bundles, store them in the
bundleStore
under their hash IDs, and add a pair ofkvStore
keys to remind the kernel of what the "current lockdown/supervisor bundle ID" areI believe @michaelfig suggested we could do this lazily and just load the bundle and hash from disk when needed by a new vat.
Yeah, as I wrote up my notes, I got worried about when exactly the DB writes would occur, and how we explain this to host-application authors. Let's see, I can think of three approaches:
initializeSwingset()
, and also during some sort of upgradeSwingset(swingStore)
function that we can figure out and implement later (some time before we start using the new bundles)I need to be able to write documentation aimed at host-app authors to explain when state changes might happen, and what their hostStorage.commit()
responsibilities are. We need well-defined windows of time during which the kernel might make changes to the swingstore, after which the host app must do their commit.
"A" is the one that most closely matches the current pattern. It would always sample and stash the bundles during initializeSwingset()
, at the same time it populates a bunch of other swingstore data (config.vats, bootstrap message on otherwise-empty run-queue, etc), and the host app already knows it needs to commit after this initialization phase, no surprises there. And requiring an explicit upgrade call would introduce another such phase, but only when the host app wants to do an upgrade. The host app would look more like:
const { hostStorage, kernelStorage } = openSwingStore(dbPath);
if (timeToUpgrade) {
updateSwingset(kernelStorage);
hostStorage.commit();
}
const c = buildVatController(kernelStorage);
// .. go run blocks
DB writes are a deterministic function of the package upgrade schedule and the host app calling the "upgrade" function.
"B" might be convenient, but the host app would need to do a commit at every launch, just in case this was the time that a new bundle was found. DB writes would be a function of package upgrade schedule and the controller incarnation schedule (process start to process termination). buildVatController()
does not currently make any state changes, and I'm not sure I want it to.
The host startup would look something like:
const { hostStorage, kernelStorage } = openSwingStore(dbPath);
const c = buildVatController(kernelStorage);
hostStorage.commit(); // in case bundle upgrades happened
// .. go run blocks
"C" puts the changes in the middle of a controller.run()
, so no problems with knowing when to commit. It would mean that the kernel must be given the ability to read the bundle files (encapsulated in some form of the startXSnap
endowment, which also needs access to the bundleStore
so it can store the sampled bundle, and read them back again later). The DB write will happen kinda haphazardly, during the first vat creation after each controller restart that happens after the on-disk bundle-supplying packages are changed. If a given controller incarnation doesn't create any new vats, the sampling will not happen, even if the package versions have changed. So the DB writes are a deterministic function of both the package upgrade schedule and userspace actions (creating new vats), which is kind of complicated.
It also adds a new failure mode: if someone launches the kernel, then modifies their source tree (e.g. checks out a different version, maybe temporarily), the DB state will depend on a race between their interference and a vat being created. If we used A or B, any potential sampling would be complete by the time kernel startup finished, so the window of vulnerability (to this admittedly "don't do that" case) is a lot smaller.
My Python t-shirt says "explicit is better than implicit":
% python3
Python 3.9.6 (default, Oct 18 2022, 12:41:40)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
>>>
But the annoying thing about "A" (explicit) is that we have to add a cosmic-swingset ActionType.UPGRADE_SWINGSET
"action" to trigger the upgradeSwingset()
call, invoked from the Golang-side upgrade handler function, sent to the JS side, and await it, before the remainder of the cosmos upgrade can continue. With "B" or "C" all you have to do is change the JS packages on disk. And we removed the bundle-the-kernel-and-store-it step last year because we liked the simplification of "just git checkout
a different version and restart".
Hrm. I know I don't want to do "B". I prefer to be explicit. I think I'm ok with e.g. swing-store either spontaneously upgrading itself (when newer code is launched against an older DB), or refusing to run and requiring a separate program be run to do the upgrade, which suggests that maybe I'm ok with the kernel spontaneously upgrading the bundles without an explicit upgrade() call, which means that maybe explicit is not so important to me after all. I'm uncomfortable with deferring the bundle reads until semi-random times long after kernel startup. We're not saving and significant space or time by loading the bundles lazily. But I don't want to implement the extra cosmic-swingset actions or coordination code to trigger the swingset upgrade, nor do I really want to add "call upgrade()
" to the instructions for host-app authors (feels like a burden).
Ok, I'll go with "C". At least with these reasons written down, if I change my mind later, I should be able to think more clearly about it.
A corollary to "C": do the sampling once on controller initialization, but don't write the results to the DB until the first time they're needed.
I don't fully follow the timing issues: The packages providing the bundles would be loaded when loading swingset, but their methods to load the bundle and check the bundle hash would be invoked every time a vat is created. The authority is already encapsulated, no need to provide new raw authority to read the filesystem.
If someone modifies their source tree in a way that will cause the existing methods to return new values, then they will just fall off consensus. Aka don't change your source tree while executing, any lazy loading will have similar constraints.
Edit: to summarize, "C", but I don't share the concerns.
Now that we have a new approach, I'm moving this ticket out of the milestone in favor of the (smaller) #7208 . We might choose to use this approach in the future, but we no longer think we need a distinct worker-v1
package for the Vaults release.
I move to close this issue since it is no longer on the plan of record. We can revive the idea if it eventually becomes necessary to have more than one worker version.
What is the Problem Being Solved?
To give us a coherent solution for #6361 , I think we need to define a new package that includes all the behavior-influencing parts of our XS-based workers. This package will have
-v1
in the name, not the version, and in fact we will only ever publish a single version of this package: all subsequent changes will go into a new package, with-v2
in the name, which will also only get one release, etc.Then we change SwingSet to depend upon this
-v1
worker package, and add a note to the kernelDB per-vat metadata to record the fact that the incarnation is using v1, rather than some earlier or later version. Any version of SwingSet can host this worker and its history (especially its XS heap snapshots) without fear of behavior changes.The nominal name of the new package is
@agoric/worker-v1
, but we might want something more swingset-specific.Description of the Design
@agoric/worker-v1
will need to contain or import the contents of:packages/SwingSet/src/supervisors/subprocess-xsnap/
packages/SwingSet/src/liveslots
(also used by other workers, and a lot of kernel unit tests)packages/xsnap
packages/SwingSet/src/lib/
The kernel-facing API of this package should let it launch a worker and get back control facets to send deliveries down and process syscalls that come back.
Security Considerations
Test Plan