WebAssembly / esm-integration

ECMAScript module integration
https://webassembly.github.io/esm-integration/js-api/index.html#esm-integration
Other
378 stars 32 forks source link

Proposal: Higher Order ESM Integration #44

Closed guybedford closed 3 years ago

guybedford commented 3 years ago

The last few issues on this repo have brought up the security question of whether or not Web Assembly provides a more secure execution environment in comparison to JS executions.

In addition, there've been much discussion over the exact workflows for the Web Assembly start fuction and binding process in terms of ensuring that the ESM import workflows provide the major use cases achieved with the declarative WebAssembly.instantiate APIs that are currently used today in Web Assembly applications.

Originally in this repo, @alexcrichton suggested in https://github.com/WebAssembly/esm-integration/issues/14 an API for the Web Assembly ESM integration to support importing a WebAssembly.Module object directly, in order to allow more easily and flexibly working with a compiled module.

Along the lines above, I'd like to propose changing the ESM semantics of importing Web Assembly in order to better achieve security and use case flexibility for Web Assembly applications.

Proposal

Web Assembly offers some highly compelling execution security properties in providing a strictly defined secure execution sandbox, down to the imported bindings provided to it.

By default the ESM integration would not naturally benefit from these security properties since it permits arbitrary JS imports from Web Assembly modules.

Instead of following the naive ESM integration, the proposal would be for all Web Assembly modules that are imported, to be imported as compiled Module objects, leaving the binding process to the JS wrapper code entirely:

import module from './app.wasm';

// true
module instanceof WebAssembly.Module

The imported value of the Web Assembly module is only a { default: Module } ES module, that can then be instantiated with a JS call to WebAssembly.instantiate:

const { exports } = await WebAssembly.instantiate(mod, { env: wasmEnv ));
wasmEnv.bind(exports.memory);

export function wasmFn (arg) {
  return exports.method(arg)
)

By only providing the uninstantiated Module, this supports a number of useful properties:

  1. Run-to-completion Web Assembly binaries can be easily reinstantiated or rebound multiple times during the lifetime of the application.
  2. Many common Web Assembly binaries today would not support being imported in the Web Assembly ESM integration. With the above, all Web Assembly binaries that exist today can be imported and used in applications
  3. The import of Web Assembly is now a secure operation in itself, and the security properties of the Web Assembly module sandboxing are fully maintained for the user.
  4. Now that Web Assembly imports are a secure operation, it then also makes sense to support an import assertion for this case

How is this better than workflows today

The first criticism of this proposal might be - what benefit exactly does the ESM integration provide at all, if this is the case?

How is this a benefit over just fetching and compiling the Wasm module directly? Eg via:

const res = await fetch(new URL('./app.wasm', import.meta.url));
const module = await WebAssembly.compileStreaming(res);

And this is true, the benefit is exactly in just reifying the above pattern into an import pattern, including:

Interaction with Import Assertions

Since importing Wasm becomes a non-executing and secure operation, it then makes sense that a Wasm module import assertion can verify this property:

import module from './app.wasm' assert { type: 'wasm-module' }

This fits the definition of import assertions in being entirely validation based, and importantly not splitting the interpretation of the module depending on the assertion / mode used.

Interaction with Module Linking Proposal

The module linking proposal provides a way for Web Assembly to handle instantiation and binding setup between multiple Web Assembly modules to provide a richer end-user API directly.

These Web Assembly modules can effectively be thought of as a new type of higher-order Web Assembly module for these wiring needs.

In the ESM integration for the module linking proposal, it would be possible to define that these modules are treated differently, such that you would get the exports and imports applying as one might expect with a JS import or the naive ESM integration.

These modules would lose the sandboxing properties so the import assertion would not apply to these forms of modules making the distinction between these two different ESM integration cases clearer.

devsnek commented 3 years ago

Why does this use case need to remove all the functionality from this proposal? Why can someone not just use the wasm api directly?

guybedford commented 3 years ago

@devsnek thanks for taking a look. See the "How is this better than workflows today" section where I include a few points here. In particular it is about easy / statically analyzable universal workflows for web assembly integration, which is the primary goal of the ESM integration IMO.

devsnek commented 3 years ago

I get how this is better than no esm proposal at all, I'm just not convinced of how this is better than the current esm proposal. It seems like the foundation of this issue is that some people not having to use fetch is more useful than most people being able to easily consume wasm as a first class module type in js.

guybedford commented 3 years ago

I'm just not convinced of how this is better than the current esm proposal.

Good point, I should have highlighted this aspect more.

The argument here is being able to load all existing Wasm binaries / support all existing Wasm use cases in use with the instantiate APIs today.

Eg Wasm binaries with an env import aren't supported in Node.js npm publishing workflows with the ESM integration, as you cannot point and bind the env import.

devsnek commented 3 years ago

I'd argue that this is more an issue of convention. You can write this JS code today: import { foo } from 'env', and it won't be useful in the npm ecosystem, in exactly the same way that a wasm module which imports from "env" is not useful in the npm ecosystem. There are two important points here. Firstly, my module may not be part of the npm ecosystem at all! ESM is much broader than that. Secondly, if my module is intended to be part of the npm ecosystem, I wouldn't randomly import stuff from "env" in the first place, just as I wouldn't import it from JS.

guybedford commented 3 years ago

Yes it is important to decide if supporting existing Wasm in use in the wild is a goal or not. Further though, the ESM integration should be designed along with the convention that build tools should output for JS users to support it. Requiring JS users to run Wasm binaries through transformation steps is my concern in creating unnecessary burdens for developers who don't want to think about this stuff.

kripken commented 3 years ago

Would this allow bundlers and optimizers to remove dead wasm code?

What I mean is, imagine that app.wasm provides two exports, exportA and exportB, and there is just one place that uses app.wasm from JS, and it only uses one of the two exports::

import module from './app.wasm';

const funcs = await WebAssembly.instantiate(module, { env: wasmEnv ));

export { funcs.exportA }; // ignore exportB

Would bundlers be able to remove exportB from the wasm module? (what worries me is that arbitrary user JS code "in the middle" makes that harder.)

devsnek commented 3 years ago

assuming well written js you could probably make something that works most of the time. I wouldn't use such a thing in my bundler though. dead code elimination tools for js tend to very quickly give up trying to prove lack of usage when member expressions are involved.

guybedford commented 3 years ago

Yes, I personally recommend the Node.js "exports" field over exports tree-shaking these days https://nodejs.org/dist/latest-v15.x/docs/api/packages.html#packages_package_entry_points. You don't have to optimize out the code you never load.

lukewagner commented 3 years ago

I think ultimately JS will want to be able to do both:

  1. import a wasm module that gets instantiated by the ESM loader (as ESM-integration is written today) and
  2. import a wasm module as a WebAssembly.Module object.

Pre-interface-types, I expect toolchains would mostly emit (2) so that they could apply custom JS glue code via WebAssembly.instantiate(), however, once interface types are available, I think (1) will become more feasible and commonplace. As a general rule, I think import foo from './bar' should behave symmetrically regardless of whether bar is a JS or wasm file, which suggests that import foo from '.bar' should mean (1) for wasm and (2) should require some distinct import syntax. Given that import assertions are supposed to only be assertions that don't otherwise affect runtime behavior, I think this means using something other than assert, e.g.: import foo from "./bar.wasm" { as: "module" }. Initially the semantics of {as: "module"} would require the Content-Type to be application/wasm, but perhaps one day JS evolves a JS version of WebAssembly.Module (i.e., a validated, but not instantiated JS module), in which case JS and wasm could be symmetrically imported {as: "module"}.

xtuc commented 3 years ago

In https://github.com/WebAssembly/esm-integration/issues/14 we discussed about way to export "this module", I think that's a more elegant than using an evaluator attribute to get the module, like in import foo from "./bar.wasm" { as: "module" }. It's also consistent with tables, memory, etc and JS likely wouldn't need it.


Just thinking out loud; If Module becomes a first-class type that we can be exported and imported. Couldn't WASI reactors be implemented in an adaptor module that sits between the the ESM integration and the Wasi module. The adaptor module will passthrough the necessary JS values using interface types and exports a init/initialise function that, using the Bulk memory operations, clears Wasi's memory and reset the instance. Still not familiar enough with Wasi but wound that make sense? It sounds quite elegant to me.

lukewagner commented 3 years ago

Revisiting #14, with Module Linking, there's no way for a module to export "itself", only other modules (that were locally-defined or imported), so if I wanted to export a module M, I'd have to wrap it with a module M' which simply nests and exports M. I guess that works, but it seems a little hacky/workaroundy. (I also have a feeling that adding the ability for a module to export "itself" to Module Linking will run into problems, though I can't say exactly what atm.)

ghost commented 3 years ago

@lukewagner Checking out the module linking proposal, they gave this example:

(module
  (import "a" (module $a ...))
  (module $b ...)
  (import "c" (instance $c ...))
  (instance $d ...)

  (export "e1" (module $a))
  (export "e2" (module $b))
  (export "e3" (instance $c))
  (export "e4" (instance $d))
)

so I see no reason why this isn't allowed:

(module $a
  (export "a" (module $a))
)

Therefore https://github.com/WebAssembly/esm-integration/issues/14 has effectively been solved by another proposal entirely.

guybedford commented 3 years ago

I think ultimately JS will want to be able to do both:

An { as: "module" } attribute sounds like a sensible way to handle this. I'm really glad you're supporting the concept of JS and Wasm having similar abilities in terms of dealing with these cases.

My original suggestion was just that by default it might make sense to treat module-linking modules / modules whose outer module is an adapter_module as always being instance imports, while modules that are the current types of Web Assembly modules (not sure what to call these?) would be returned as the WebAssembly.Module object instead.

A default import treatment like the above would then allow:

Using evaluator attributes it would then still be possible to alter these behaviours to switch either of the above into the alternate mode, but the defaults of the ESM integration become defaults that work for the majority of use cases.

My primary concern here is that Wasm binaries output via the conventions of today cannot be executed under the ESM integration in Node.js or browsers without constructing some non-standard semi-private node_modules/env package that is scoped and memory bound to its parent in the Node.js case or relying on a specially scoped "env" import via import maps in the browser case.

The story I'd just like to see clarified for this ESM integration proposal is what the conventions of integration are for full end to end workflows when this lands without forcing these somewhat cumbersome conventions which users would naturally be incentivised to apply to their workflows otherwise. If environments like Node.js and Deno (or browsers) want to apply the ESM integration in the next couple of years, these types of workflow conventions start to become actual engrained patterns otherwise, unless we should be sure to change the recommended import conventions before then in order to ensure better alignment.

lukewagner commented 3 years ago

@00ff0000red Nice job digging in! That would almost work, but the validation rules only allow referring to preceding modules, with the specific intention of preventing cycles like that. (Particularly with type imports/exports, cycles introduce serious complications.)

@guybedford One side note is that "module linking modules" shouldn't be a distinct "kind" of module; module linking just gives you new ways of defining and instantiating (existing) core wasm modules. But I think your point stands w.r.t adapter modules, which are a different kind of module.

Having separate defaults (for core vs. adapter modules), with later evaluator attributes to flip the behavior, seems like a practical potential solution to the "env" problem you're talking about.

ghost commented 3 years ago

The story I'd just like to see clarified for this ESM integration proposal is what the conventions of integration are for full end to end workflows when this lands without forcing these somewhat cumbersome conventions which users would naturally be incentivised to apply to their workflows otherwise. If environments like Node.js and Deno (or browsers) want to apply the ESM integration in the next couple of years, these types of workflow conventions start to become actual engrained patterns otherwise, unless we should be sure to change the recommended import conventions before then in order to ensure better alignment.

I feel like this is an irrelevant argument for adoption of Wasm into ESM.

Just like with the initial adaption of ESM into ES, people didn't just convert their code to a module, it usually had to be rewritten or more heavily changed to make use of ESM. Some workflows still haven't been adjusted to use ESM, some still preferring to downlevel their code to static linkage anyway.

Similiarly, when ES6 introduced "use strict," it wasn't just slapped on old code, it would simply break old code.

To assume that one could just import a non-ESM Wasm modules seems wrong. Also, this creates a dependency on the host having adaptor modules and interface types implemented, which may not be the case.

If the core Wasm were exported as a WebAssembly.Module, wouldn't there also be a reduction in performance? If it were directly instantiated, implementations could use whatever streaming compilation techniques that they use right now, but in your case they would have to compile it and instantiate it separately at different points in time.

guybedford commented 3 years ago

One side note is that "module linking modules" shouldn't be a distinct "kind" of module

When I first wrote this proposal I wasn't clear at the time that the distinction here would be based on the adapter module instead, thanks for explaining.

Having separate defaults (for core vs. adapter modules), with later evaluator attributes to flip the behavior, seems like a practical potential solution to the "env" problem you're talking about.

I guess the concern then is if having a semantic difference between core and adapter modules will introduce more complexity / cognitive overhead to these workflows or less.

Perhaps the deciding question here in terms of what the practical workflows will be is really how much we can lean on evaluator attributes as being an available solution within the next two years / similar timeframe as shipping of the ESM integration? @littledan wondering if you have any thoughts on that.

In terms of the natural conventions, I previously mentioned import maps and a nested private node_modules as mechanisms for supporting env mapping. Another mechanism would be to change the env convention in all tooling to either be ./env.js by default or even #env would support custom private import mapping in Node.js (https://nodejs.org/dist/latest-v15.x/docs/api/packages.html#packages_subpath_imports).

littledan commented 3 years ago

I like the idea of keeping the "default" Wasm/ESM semantics as is in this proposal, and having some kind of additional syntax to opt-in to getting an uninstantiated module.

About when evaluator attributes will be available: this all depends on when people bring clear use cases to TC39 and champion the proposal. Import assertions took less than a year to get from nothing to Stage 3, so I don't think that evaluator attributes will necessarily take very long, if people put in the work.

I don't plan to champion the evaluator attributes proposal personally, but I am happy to mentor others to work on it. This proposal is really blocking on implementation work, whereas evaluator attributes need this design work, so they may take longer. Is there some reason that they should be released in a similar timeframe?

guybedford commented 3 years ago

Thanks all for the engagement here, it's been very helpful. To summarize my opinions on this topic:

This does give me a new sense of the importance of evaluator attributes and I would be glad to get involved in assisting evaluator attributes however I can, although I likely don't have bandwidth for the next couple of months myself.