endojs / endo

Endo is a distributed secure JavaScript sandbox, based on SES
Apache License 2.0
833 stars 72 forks source link

Why modules must be instantiated uniquely per compartment, despite bundling #1583

Open erights opened 1 year ago

erights commented 1 year ago

What Happens When a Module Meets its Eval Twin?

From https://github.com/Agoric/agoric-sdk/issues/7632#issuecomment-1540990371

I don't understand what this means. Are CopySets not iterable?

CopySets are not iterable, and the reason makes very clear the high costs of the way we currently use Hardened JS (aka SES). We are still operating under the normal JS assumption that any one module or package might blindly be loaded multiple times.

Say a module x.js must both create correct Xs and validate that an alleged X is a correct X. In most programming languages, x.js could just check an alleged X to see if it was made by itself, i.e., x.js. Since x.js is careful to only make correct Xs, it knows that if it made the alleged X to be an X that it is a correct X. To do this requires some sort of branding, which most programming languages provide by some combination of static and dynamic techniques.

In JS we have two branding mechanisms that it seems we could use: WeakSets and classes-with-private-fields. For our purposes here they are equivalent, so let's only consider WeakSets. x.js could create and encapsulate a WeakSet in which it puts only the Xs that it creates. It then validates than an alleged X is a correct X simply by seeing if it is in the encapsulated WeakSet. In fact, we used to do that before we got bitten by the "multiple instances of x.js blindly co-existing" problem.

The culprit, as I understand it, is the happenstance of bundling decisions. Current bundling practice, as I understand it, is not careful to avoid duplicating the same x.js module as bundled into two different bundles loaded into the same environment. The unbundling process does nothing to prevent the code duplication from resulting in co-existing module instances. In fact, the multiple bundlings may be of different versions and subject to different linkage. This also explains why x.js cannot coordinate with "itself" purely by convention, such as placing a registered symbol on an X to say "I really am an X". There is no convention that one correct instance of x.js can use to talk to another that an attacker cannot use.

As a result, an X made by one x.js instance x1 is not registered in the WeakSet created by another x.js instance x2. When a correct X make by x1 is presented to x2 for validation, x2 rejects it. The uncontrolled blind duplication has resulted in a rejection that would normally indicate misbehavior, even though all the source code involved was operating correctly.

This places a severe restriction especially on the @endo/pass-style package in creating and validating Passables. To cope with this restriction, the @endo/pass-style package must be able to do all its validation ONLY by examining data of the alleged X, which means that all the data must be open, unencapsulated, and reliably available through reflection. Thus, we cannot give any kind of Passable useful methods like next() if we also require a correct Passable's next method to operate in a particular way. The behaviors of functions/methods/classes are outside what one x.js can validate about an X made by another.

(Performance note: Each such module does have a similar WeakSet or WeakMap anyway, but only to memoize its own validations so it can avoid duplicate work. It just can never reject anything simply because it is absent from such a memo.)

Two fortunate developments since we first encountered this problem gives me hope that we can fix it. One of these is on the demand side and one on the supply side.

Until then, we cannot add useful methods to Passables, like next, if those methods must operate in a particular way for that Passable to be correct. So for now, the answer for CopySets is to say

getCopySetKeys(copySet)

I don't like this any more than you do.

jridgewell commented 2 months ago

The culprit, as I understand it, is the happenstance of bundling decisions. Current bundling practice, as I understand it, is not careful to avoid duplicating the same x.js module as bundled into two different bundles loaded into the same environment. The unbundling process does nothing to prevent the code duplication from resulting in co-existing module instances.

This isn't correct in my experience. When bundling, we may chose to embed x.js in 2 different bundles, but only 1 will ever be instantiated (except when dealing with HMR, but that's a development-only concern that I don't think should apply here).

Both Webpack and Turbopack operate essentially a CJS module system at runtime (even if they used ESM to deliver bundles, the code would still be CJS modules inside). When trying to require a module, we check if it's in the runtime module map (and instantiate if not). If multiple bundles embed the x.js file, then we'd detect that module is being registered with a duplicate module id and ignore it.

mhofman commented 2 months ago

I believe part of the confusion is that the problem is mostly in the way packages are managed, not really how they're bundled. Afaik, bundlers map modules through their resolved paths on disk. The fact that 2 x modules were imported through the identifier 'x' doesn't matter: if these identifier resolved to 2 different x implementations on disk, there would be 2 separate x modules in the bundle, even if these 2 x module implementations are identical, and don't have differently resolved imports themselves.

The bundler is not in a position to question the choices made by the package manager. Most bundlers follow common identifier resolution algorithms derived from the way Node.js resolves module identifiers.

The reality is that different package managers have different ways of resolving package dependencies, and layout packages on disk. If an app uses package a and b, which both depend on the same version of package x, the package manager can arrange the x dependencies such that it may or may not resolve to the same disk path. This is particularly probable if another package c depends on another version of x.

IMO this is really an ecosystem problem of how package versions are managed. Historically there wasn't as much reliance on brand and recognizability of values, and this problem only creeps up as an application grows and evolves.