eval files/modules/packages, not just strings

warner commented 6 years ago

SES.confine takes a string and evaluates it as a program, returning the last expression of the program. This is the primitive we need to build larger environments. This ticket is about what shape those larger environments should take. Programmers don't work with strings containing code (how do you get an editor to syntax-highlight it, for starters): they work with files containing code, or modules, or names that point to modules. Programs import modules, not strings. Users run "apps", not programs or strings. We'll eventually need to define an environment that starts with a programmer and a source tree, goes through some bundling/packaging/deployment phase, and ends with a whole bunch of nested evals (on somebody else's computer) that put the right code in the right places.

I'd like to collect ideas and examples of related systems which we can draw from, and build up a collection of sample apps that we can implement in different schemes to see how they compare.

Two things come to mind from my own background. The first is the Jetpack security model which I developed for Firefox (but which never really got deployed). https://github.com/warner/JetpackComponents/blob/master/components.md has a writeup. This scheme used CommonJS-style require(name) statements to import code from other modules. A magic module name represented full platform power: in Jetpack you would get access to the browser's entire authority by including require("chrome") in your module. A bundling script searched for all require() statements and built a graph of them: most modules were to be found elsewhere in the source tree, but magic names would be mapped to special objects at load time. The graph was written into a manifest, including the hashes of each module, as a table saying "when module X says require(Y), give it Z". This manifest, and all the source files, were bundled into the installable addon archive (a ZIP file), and the runtime loader enforced the manifest restrictions.

The idea was that the loader could also check for signatures from the official addon gallery reviewers, who could examine each module for safety and whether they did their declared job (or could be tricked into doing something else). Using require("chrome") would trigger more careful review. In that system, the top-most node of the module graph was the weakest, and contained the most important application logic, and would be unique to the addon. The bottom-most nodes (many of which would be shipped with the SDK as a stdlib) had the most power, and would be shared between many different addons. The overall goal was to encourage code reuse and limit excess authority.

The other piece is http://www.lothar.com/blog/58-The-Spellserver/ , which investigates a purely-client-defined server model (in which the client is confined by signed prefixes that define the endowments which their new code can access). In that world, the server receives hashes that represent code, and maybe the code itself if it wasn't already cached. It isn't captured by my writeup, but the source form (which authors edit) would have some kind of CommonJS require(PETNAME) syntax, which would be rewritten as require(HASH) before delivery to the servers (so any lookup is nailed down before the application leaves the programmer's environment). The rule would be that any require() statement could be replaced by simple interpolation of the file that hashes to HASH, which means require() does not grant any authority (unlike my Jetpack model). In the Spellserver, all authority is delivered as arguments to the function at runtime, rather than being available as ambient endowments or explicitly requested imports. I have a scheme to do rights-amplification, by allowing two prefixes to both be used by a child program. In this approach, the first program in the chain run gets the most power, but it also defines which keys must sign the next program down (rather than having some central reviewers or bundled manifest to enforce which program gets the authority). So the power graph is somewhat inverted when compared to the Jetpack approach.

I think https://github.com/tc39/tc39-module-keys is going to be relevant too.

@erights and I started to prototype a module loader a few weeks ago, but we got a bit stuck trying to figure out how to represent the loader policy: some datastructure which explains what happens when module X asks to import name Y. The loader itself was a recursively-injected endowment into each evaluation step: when we evaluate X, we inject a require() or import() or something which curries the portion of the policy table that talks about module X. The code was a bit fiddly (as most recursive things are), but we're sure it's possible.

Another thing to keep in mind is debuggability and sourcemaps. At present, each SES.confine() is a barrier beyond which stack traces and line numbers get thrown away. It might be interesting to feed a sourcemap into confine() (or into some wrapper/membrane around it) that knows how to translate exception information into a form usable by the next layer up. Whatever module loader we come up with should integrate into this scheme, so perhaps the when X imports Y it gets Z table should be extended to also include .. and uses sourcemap Q on each row.

warner commented 6 years ago

One thing we're experimenting with is using rollup and its internal API to turn a directory (with an index.js that contains ES6-module -style import statements) into a single string, with a CommonJS-style exports object. I have a helper function which creates a module and module.exports object, provides them as endowments, evaluates the rollup-ed string against those endowments, then pulls off the exports afterwards. This makes it pretty handy: the imported code can span multiple files, and you don't have to resort to any funny syntax tricks to make a function come out as an expression.

I'm doing this from the primal realm to load a bunch of code into an SES realm, so unfortunately I'm exposing a primal-realm Object to that code (it could use Object.getPrototypeOf(module.exports).constructor.constructor to get back to the full-powered Function object, and then access the primal global scope from there). To make this safe we need to run rollup in the primal realm, then pass the bundled string into the SES realm, and do the module.exports trick around an SES.confine from inside the SES realm.

In this approach, module imports do not provide any additional authority. Any endowments passed into the SES.confine call are available to all modules being imported, not just the index.js entry point, so the imported code has no opportunity to partition itself into lesser-authority subsets (to achieve that I think you have to pass the authority in as an argument to some entry-point function, like a Powerbox).

dodtsair commented 6 years ago

How does SES.confine work with javascript strings that contain import and export statements?

erights commented 6 years ago

How does SES.confine work with javascript strings that contain import and export statements?

SES.confine itself parses the string as a program, i.e., using the Program production of the grammar as its start production. The syntax of a program excludes import and export statements. Thus, if they are present in the string, the string is rejected as a SyntaxError.

A program can contain an import expression, but the SES shim prohibits it.

dodtsair commented 6 years ago

Do you guys have any thoughts on building basic module support. The idea is to allow a framework to bring in third party code determine what capabilities it exposes and what capabilities it needs by requiring each to be an import and export.

Something like: var module = SES.load(moduleString) var imports = module.imports var importAs = imports.map (anImport => anImport.as); var importSource = import.map(anImport => anImport.source) var importDefault = import.filter(anImport => anImport.isDefault) var import Targets =import.map(anImport => anImport.target)

//assume the existence of a framework that can map the script to the assigned capabilities var importCapabilities = framework.lookUpCapabilities(moduleString, imports)

var exports = module.confine(import Capabilities)

var defaultExport = exports.defaultExport var named Exports = Object.keys(exports.namedExports).map(exportName => {name: exportName, exportRef: exports.namedExports[export Name]})

kumavis commented 5 years ago

I'm working on a plugin for the browserify build system that wraps module definitions in a SES container https://github.com/metamask/sesify

erights commented 5 years ago

Hi @kumavis great to see this! Does MetaMask currently use browserify? Might MetaMask switch to sesify? (That would be awesome!)

We just started a Discourse site for ocaps in JavaScript, with a topic on safe modules at https://ocapjs.org/c/safe-modules . Please join and add a link to sesify. Thanks!

kumavis commented 5 years ago

Yes, MetaMask currently uses browserify and could move to sesify if I can reduce the developer/debugging overhead to something manageable

dckc commented 5 years ago

The simplest thing that could possibly work, as far as I can see is:

modules are loaded with no authority
all authority is passed in to a main() entry point by the runtime.

This is the approach taken by, for example, pony:

actor Main
  new create(env: Env) =>
    env.out.print("Hello, world!")

-- https://tutorial.ponylang.io/getting-started/hello-world.html

and monte: https://monte.readthedocs.io/en/latest/quick_ref.html#file-i-o-and-modules

I think the following suffices for authorities available to main():

from ecmascript
- the clock (i.e. new Date())
- Math.random
currentVat / Worker or the like
whatever libuv offers (except threads)
- the filesystem
- the network (whatever TCP, UDP, DNS, ... idioims libuv exposes)
- TTY
- ...
  - dynamic loader (here be dragons!) a. a way to load code and claim that it is powerless so that other modules can require() it; for example, fast encryption b. a way to load code that uses ambient authority (e.g. makes its own libc / kernel calls); the results of this load function have to be passed explicitly around the program

The dynamic loader deserves special treatment (command-line flags to enable / disable it, package metadata, ...) as the platform can no longer guarantee ocap safety. But it's possible to build libraries from safe languages (rust, pony, ...) and/or formally verify code (https://galois.com/project/amazon-s2n/) etc.

I suppose by the logic I have used to justify dynamic loading, one could justify Jetpack style "badged" ambient authority in JS as well. So perhaps the dynamic loader is not part of the simplest thing that could possibly work...

warner commented 5 years ago

Some notes from today's conversations:

We've added a basic requireMode: 'allow' setting to the SES.makeSESRootRealm(options) bundle. If enabled, a require() function will be added to the realm's global namespace. It currently accepts exactly one name: require('nat') gets you the same thing as the current Nat global (which will go away in favor of the imported form).

The goal here is to let code use const Nat = require('nat') instead of the global, so that the same file works both within a SES environment (i.e. stringified and submitted to an s.eval()) and outside SES (say, in a unit test).

The next step is to add harden() to this list, and remove the current global def(). The step after that is to add SES to the list and remove the global SES. Then code should not expect the globals to be any different within SES than from outside it.

The step after that is to change the API of requireMode: into a table that maps a name to a description of the object that require(name) should return. We're thinking of three forms for the values of this object:

nat: Nat: when the value is a function, we'll evaluate `(${value})`. That will turn simple (pure) functions like Nat into an in-realm function that behaves exactly the same way, but with the correct objects and prototypes
fs: {source:${wrap_fs}, realFS: fs, rootDir: 'root'}: The value will have some source code (which is supposed to attenuate some outer-realm authority, e.g. by providing filesystem access but only below some root directory), and a bunch of endowments. Either wrap_fs is evaluated with the endowments as globals (yielding the value to be returned from require('fs'), or wrap_fs is supposed to evaluate to a function which will be called with an object containing the endowments (and yielding the value for require).
harden: true: Some values are special, harden and SES both interact with SES itself, so the only control is whether to expose these or not.

Our hope is that Flow can be exposed to the PlaygroundVat in this fashion, where the endowments wrap some networking code that allows the Flow-based Vows to send messages off to other Vats.

@dtribble had another scheme involving template literals, something like:

fs: confined`import { fs } from ${fs};
             import {root} from ${rootDir};
             ${wrap_fs}`,

where the confined quasiparser would do some magic to recognize the endowments being passed in and return a data structure like the {source, realFS, rootDirString} from above.

Next after that, @erights is concerned that offering require at a global level, for all evaluations inside a realm, is too broad an authority, and instead we should be providing require() at each s.eval() call. It wouldn't be safe to define a require function from outside the realm and then pass it in as an endowment (that require would have an unsafe constructor), but we could build a helper function that would turn the requireMode: data structure into an in-realm function. The resulting invocation might look like:

s.eval(code, { require: s.makeRequire({nat: Nat, fs:{...}}) });

We could conceivably curry this for convenience: s2 = s.withRequires({nat: Nat}) would let you do s2.eval(code) and adds a require global with nat available to be imported.

Finally, we're thinking that in addition to emphasizing s.eval() as a way to evaluate strings (which is a good mode of thinking for e.g. a spreadsheet where you're evaluating a formula for each cell, or a database that's evaluating stored procedures), we should offer an s.loadModule(moduleName, loader) that emphasizes modules. This might be more natural for folks writing command-line applications or maybe web pages.

The loader API would probably be something like loader.createModule(s, parentId, childName) -> { childId, module }. The loader is responsible for:

1: using some sort of manifest to convert parentId and childName into the source code that should be evaluated 2: using s.realm() to evaluate that source code, giving it access to a require() that's scoped to the right childId (so the child's require() calls use the correct part of the manifest 3: offering the child any special endowments that the manifest says it's supposed to have 4: returning the evaluated module, as well as a childId that should be used for subsequent calls to createModule

Obviously there's some stuff to figure out here (the loader might instead be asked to return the child's source code, leaving the actual evaluation up to the SESRealm that invoked it, but we'd need to figure out endowments somehow). Maybe a helper function to turn a manifest data-structure into a loader function. We might have both s.evalModule(moduleSource, loader) for dynamic cases and s.loadModuleByName(moduleName, loader) for the more static flavor (both of which could return the module's exports so we can cautiously interact with the confined code).

dckc commented 5 years ago

... Either wrap_fs is evaluated with the endowments as globals (yielding the value to be returned from require('fs'), or wrap_fs is supposed to evaluate to a function which will be called with an object containing the endowments (and yielding the value for require).

Pass the endowments explicitly as an argument, please. Code that assumes globals is counter to ocap discipline and interacts poorly with static analysis tools.

warner commented 5 years ago

Today we discovered another reason to avoid providing require configuration to SES.makeSESRootRealm().. that happens too early to let us easily share a deep-freezing harden function between the SES realm's initial freeze-all-primordials phase and the later require(@agoric/harden) call. We were testing some code for Agoric/SES-shim#185 and found that require(@agoric/nat) was failing because it was trying to def() the Nat object, and that particular instance of def() didn't share the what-was-already-frozen list with the one that froze the primordials. The require() call was failing internally because that last def() was given an object whose prototype wasn't on its already-frozen list.

To share the same def/harden between both stages, I think we must abandon require configuration at the early stage and instead go with the s.evaluate(code, {require: s.makeRequire(config)}) approach. makeRequire can close over the shared hardener to satisfy a config that enables require('@agoric/harden').

dckc commented 5 years ago

I managed to write a web server where the main module is, more or less, an emaker:

https://github.com/dckc/do-your-worst/blob/master/main.js

Now that I've done it, I realize your SES Challenge does pretty much the same thing, except on the client side.

kriskowal commented 4 years ago

Tracking Compartment module system development at https://github.com/Agoric/SES-shim/issues/257. A small part of the puzzle.

kriskowal commented 4 years ago

Tracking the remaining elements of the puzzle on #336 and archiving this excellent conversation. I’ve added work items to that task description that should answer the requirements described here, and would like to continue this conversation on that draft stack.

endojs / endo

eval files/modules/packages, not just strings #190