Can WebAssembly.Instance recompile an already-compiled Module?

jfbastien commented 8 years ago

Say I have this code:

let descriptor = /* ... */;
let memories = Array.apply(null, {length: 1024}).map(() => new WebAssembly.Memory(descriptor));
let instance = fetch('foo.wasm')
  .then(response => response.arrayBuffer())
  .then(buffer => WebAssembly.compile(buffer))
  .then(module => new WebAssembly.Instance(module, { memory: memories[1023] }));

Is WebAssembly.Instance allowed to block for a substantial amount of time. Could it for example recompile the WebAssembly.Module?

In most cases I'd say no, but what if the already-compiled code doesn't particularly like the memory it receives? Say, because that memory is a slow-mode memory and the code was compiled assuming fast-mode? maybe memories[0] was a fast-mode memory, but memories[1023] sure won't be.

What about this code instead:

let instances = [0,1,2,3,4,5,6,7].map(v => fetch(`foo${v}.wasm`)
  .then(response => response.arrayBuffer())
  .then(buffer => WebAssembly.compile(buffer))
  .then(module => new WebAssembly.Instance(module)));

Are those calls to WebAssembly.Instance allowed to cause recompilation?

Assuming the above makes sense, here are a few related questions:

Do we want a promise-returning async function which can compile and instantiate? I'm not saying that we should drop any of the synchronous and asynchronous APIs that we already have, I'm proposing a new asynchronous API.
How does a browser expose that compiled code in a WebAssembly.Module is fast, and that a WebAssembly.Memory instance is suitable for such fast code? Right now the answer seems to be "try it and see if you can notice".
How does a user know how many WebAssembly.Memory instances they're allowed before they get slow code (counting the implicit ones, e.g. as created by the second example)?

mbebenita commented 8 years ago

Would be nice for WebAssembly.Instance to sometimes cause recompilation, this way immutable globals vars could be constant folded in the generated code. For instance, Emscripten generates relocatable code by offsetting all pointers to static data. The offset is passed in as an immutable global var when the module is instantiated. If WebAssembly.Instance can recompile, it could specialize the generated code.

rossberg commented 8 years ago

The specification doesn't define what "compilation" is, nor would it make sense for it to do so, because implementation approaches may wildly differ (including interpreters). So it cannot have any normative say about this either way. The best we could do is adding a note that WebAssembly.Instance is expected to be "fast".

On 27 October 2016 at 03:24, Michael Bebenita notifications@github.com wrote:

Would be nice for WebAssembly.Instance to sometimes cause recompilation, this way immutable globals vars could be constant folded in the generated code. For instance, Emscripten generates relocatable code by offsetting all pointers to static data. The offset is passed in as an immutable global var when the module is instantiated. If WebAssembly.Instance can recompile, it could specialize the generated code.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/design/issues/838#issuecomment-256522163, or mute the thread https://github.com/notifications/unsubscribe-auth/AEDOO9sJPgujK3k0f6P7laYV_zaJxES5ks5q3_1LgaJpZM4Kh1gM .

lukewagner commented 8 years ago

Agreed this would be a non-normative note at most.

In SM, we are also currently intending for instantiation to never recompile so that there is a predictable compilation cost model for devs (in particular, so that devs can use WebAssembly.compile and IDB to control when they take the compilation hit). Instantiation-time recompilation from within the synchronous Instance constructor would certainly break that cost model and could lead to major jank.

But I do appreciate that separate compilation is fundamentally at odds with a variety of optimizations one might want to do to specialize generated code to ambient parameters. Fusing compilation and instantiation into one async op makes sense and is something we've considered in the past. The downside, of course, is that this inhibits explicit caching (there is no Module), so the developer has to make an unpleasant tradeoff. Some options:

The impl could do implicit content-addressed caching (which could include ambient parameters in the key), like we do with asm.js currently in FF. This would be kindof a pain and has all the predictability/heuristic problems of any implicit cache.
We could create a new way (e.g., a new WebAssembly.Cache API where you pass in bytecode and instantiation parameters and get back a Promise<Instance>.

The latter intrigues me and could provide a much nicer developer experience than using IDB and perhaps a chance to even further optimize caching (since the cache is specialized to purpose), but it's certainly a big feature and something we'd want to take some time to consider.

jfbastien commented 8 years ago

@rossberg-chromium I seem to have explained my purpose badly: I don't want to quibble over what the spec says. I'm trying to point out what seems like a serious surprise for developers, hiding under the API. A developer won't expect .compile's result to be re-compiled. That seems like a design flaw to me.

@lukewagner even with implicit or explicit caching we may have the same issue: how many WebAssembly.Memory can be created in the same address-space / origin is a browser limitation. I like what you're suggesting, but I think it's orthogonal to the issue. Let me know if I've misunderstood what you suggest.

Maybe .compile and Module could be given a Memory, and Instance has a .memory property which can be passed to other compilations / instantiations?

I'm not trying to eliminate the possibility of re-compilation, I think we rather want a common idiomatic API usage which has perfect information w.r.t. Memory at first-compile-time (or at cache-retrieval time) so that the compilation emits bounds checks or not if needed.

lukewagner commented 8 years ago

@jfbastien With implicit/explicit caching that was provided the particular instantiation parameters (so Memory), I don't see how there would be any need for recompilation.

jfbastien commented 8 years ago

@jfbastien With implicit/explicit caching that was provided the particular instantiation parameters (so Memory), I don't see how there would be any need for recompilation.

There may be:

Create many Memorys.
Compile code, with explicit (slow) bounds check because there were too many Memoryies.
Cache that code.
Leave page.
Load page again.
Allocate only one Memory, which gets the fast version.
Get from the cache.
Receive slow code Instance.

At this point I agree you don't need recompilation, but we're being a bit silly of we do slow bounds checks when we don't need to.

As I said: I like this Cache API you propose, I think it makes WebAssembly more usable, but I think the problem is still there. 😢

lukewagner commented 8 years ago

Well that's my point about having an enhanced cache that accepts instantiation parameters and bytecode: the cache is free to recompile if what it has cached doesn't match the instantiation parameters. So the steps would just be:

create many Memorys
request an Instance from the cache, passing one of those (slow) Memorys
slow-code is compiled, cached and returned as an Instance
leave page
load page again
allocate only one Memory
request an Instance from the cache, passing the fast Memory
fast-code is compiled, cached and returned as an Instance

and after step 8, all future page loads will get cached fast or slow code.

pizlonator commented 8 years ago

@lukewagner First of all, you're proposing a mitigation that flies in the face of the stated goal of WebAssembly providing deterministic performance. The difference between slow and fast was last quoted as around 20%, so it would really stink if a spec that painstakingly aims aims for deterministic perf drops it on the floor because of an API quirk. I don't buy that the browser having a content-addressed cache is the right solution, because the spec already goes to a lot of trouble elsewhere to obviate the need for profile-recompile-cache optimizations. For example, we promisify compilation precisely so that the app can get reasonable behavior even if the code is not cached. If the way this is spec'd necessitates all of us to implement caches or other mitigations then we will have failed our goal of giving people a reasonably portable cost model.

To me the issue is just this: one of the optimizations that we will all effectively have to do for competitive reasons (the 4GB virtual memory bounds checking, which I'll just call the 4GB hack) cannot be done in the current spec without sacrificing one of these things:

You can get away with it if you always allocate 4GB of virtual memory for any wasm memory. This will discourage people from using WebAssembly for small modules, because you will hit virtual memory allocation limits, virtual memory fragmentation, or other problems if you allocate many of these. I also fear that if you allow allocating a lot of them then you will reduce the efficacy of security mitigations like ASLR. Note that existing APIs don't share this danger, since they commit the memory they allocate and they will either OOM or crash before letting you allocate much more than what physical memory allows.
You can get away with it if you allow for a recompile when you find a mismatch during instantiation (compiled code wants 4GB hack but the memory doesn't have that virtual memory allocation). You could also get away with it if instantiation moved the memory into a 4GB region, but see the previous point. So, probably anytime this happens, it'll be a P1 bug for the browser that encountered it.

I think that this means that the spec will encourage vendors to converge to just allowing 4GB reservations anytime wasm memory is allocated, or to have cache/lazy-compile/profile optimizations to detect this.

Finally, I don't understand the point about making any of this non-normative. This can be normative, because we could make the API preclude the possibility of the browser having to compile something without knowing what kind of memory it will have. I imagine that there are many ways to do this. For example, instantiating could return a promise and we could remove the separate compile step. This would make it clear that instantiation is the step that could take a while, which strongly implies to the client that this is the step that does the compilation. In such an API, the compiler always knows if the memory it's compiling for has the 4GB hack or not.

It's sad that we're only noticing this now, but I'm surprised that you guys don't see this is a bigger issue. Is there some mitigation other than caching that I'm overlooking?

mtrofin commented 8 years ago

@jfbastien in your motivating scenario, you pointed out that the module was authored to prefer fast memory. I'm assuming you're primarily chasing enabling the fast memory optimization when a particular module wants it, and might be OK with not doing it when the module doesn't want it (nothing bad with opportunistically stumbling upon it in that case, too, just trying to tease apart priorities).

If so, how would these alternatives to caching or to async Instantiate feel like:

Module author must require 4GB as min/max memory
A variant of compile (async at least, maybe also sync) that produces an instance accepting only fast memory.

kripken commented 8 years ago

For the issue of the "4GB hack" and mismatches between memory using it and code expecting it, would it make sense for compilation to internally emit two versions of the code? (Obviously this would use more memory, which is sad, but hopefully compile time wouldn't be much worse, the writer could generate both at once?)

jfbastien commented 8 years ago

@mtrofin I don't think it makes sense to ask for 4GiB if you don't intend to use it. The virtual allocation is separate from the use intent, so I think we'd need to separate both.

On 2.: it still isn't super helpful to the developer: if they use that variant and it fails, then what?

@kripken I don't think double compilation is a good idea.

pizlonator commented 8 years ago

@kripken I think that's what we would do without any other resolution to this issue.

I want WebAssembly to be great in the case of casual browsing: you tell me about a cool thingy, send me the URL, I click it, and I amuse myself for a few minutes. That's what makes the web cool. But that means that many compiles will be of code that isn't cached, so compile time will play a big part in a user's battery life. So, double-compile makes me sad.

pizlonator commented 8 years ago

@mtrofin

Module author must require 4GB as min/max memory

That's not really practical, since many devices don't have 4GB of physical memory. Also, that's hard to spec.

A variant of compile (async at least, maybe also sync) that produces an instance accepting only fast memory.

I don't think we want double compiles.

lukewagner commented 8 years ago

@pizlonator Thus far, we haven't considered designs which required different modes of codegen: we've just always allocated 4gb regions on 64-bit and observed this to succeed for many many thousands of memories on Linux, OSX and Windows. We have a conservative upper bound to prevent trivial total exhaustion of available address space which I expect will be sufficient to support the many-small-libraries use case. So I think the new constraint we're addressing here is that iOS has some virtual address space limitations which could reduce the number of 4gb allocations.

So one observation is that a large portion of the bounds-check-elimination allowed by the 4gb hack can be avoided by just having a small guard region at the end of wasm memory. Our initial experiments show that basic analyses (nothing to do with loops, just eliminating checks on loads/stores w/ the same base pointer) can already eliminate roughly half of bounds checks. And probably this could get better. So the 4gb hack would be a more modest, and less necessary, speedup.

Another idea I had earlier would be to pessimistically compile code with bounds checks (using elimination based on the guard page) and then nop them out when instantiating with a fast-mode-memory. Combined, the overhead could be pretty small compared to idealized fast-mode code.

pizlonator commented 8 years ago

@lukewagner

Thus far, we haven't considered designs which required different modes of codegen: we've just always allocated 4gb regions on 64-bit and observed this to succeed for many many thousands of memories on Linux, OSX and Windows. We have a conservative total number to prevent trivial total exhaustion of available address space which I expect will be sufficient to support the many-small-libraries use case. So I think the new constraint we're addressing here is that iOS has some virtual address space limitations which could reduce the number of 4gb allocations.

This isn't an iOS-specific problem. The issue is that if you allow a lot of such allocations then it poses a security risk because each such allocation reduces the efficacy of ASLR. So, I think that the VM should have the option of setting a very low limit for the number of 4GB spaces it allocates, but that implies that the fall-back path should not be too expensive (i.e. it shouldn't require recompile).

What limit do you have on the number of 4GB memories that you would allocate? What do you do when you hit this limit - give up entirely, or recompile on instantiation?

So one observation is that a large portion of the bounds-check-elimination allowed by the 4gb hack can be avoided by just having a small guard region at the end of wasm memory. Our initial experiments show that basic analyses (nothing to do with loops, just eliminating checks on loads/stores w/ the same base pointer) can already eliminate roughly half of bounds checks. And probably this could get better. So the 4gb hack would be a more modest, and less necessary, speedup.

I agree that analysis allows us to eliminate more checks, but the 4GB hack is the way to go if you want peak perf. Everyone wants peak perf, and I think it would be great to make it possible to get peak perf without also causing security problems, resource problems, and unexpected recompiles.

Another idea I had earlier would be to pessimistically compile code with bounds checks (using elimination based on the guard page) and then nop them out when instantiating with a fast-mode-memory. Combined, the overhead could be pretty small compared to idealized fast-mode code.

Code that has bounds checks is best off pinning a register for the memory size and pinning a register for memory base.

Code that uses the 4GB hack only needs to pin a register for memory base.

So, this isn't a great solution.

Besides the annoyance of having to wrangle the spec and implementations, what are the downsides of combining compilation and instantiation into one promisified action?

lukewagner commented 8 years ago

The issue is that if you allow a lot of such allocations then it poses a security risk because each such allocation reduces the efficacy of ASLR.

I'm not an expert on ASLR but, iiuc, even if we didn't have a conservative bound (that is, if we allowed you to keep allocating until mmap failed because the kernel hit its number-of-address-ranges max), only small fraction of the entire 47-bit addressable space would be consumed so code placement would continue to be highly random over this 47-bit space. IIUC, ASLR code placement isn't completely random either; just enough to make it hard to predict where anything will be.

What limit do you have on the number of 4GB memories that you would allocate? What do you do when you hit this limit - give up entirely, or recompile on instantiation?

Well, since it's from asm.js days, only 1000. Then the memory allocation just throws. Maybe we'll need to bump this, but even with many super-modularized apps (with many seprate wasm modules each) sharing the same process, I can't imagine we'd need too much more. I think Memory is different than plain old ArrayBuffers in that apps won't naturally want to create thousands.

Besides the annoyance of having to wrangle the spec and implementations, what are the downsides of combining compilation and instantiation into one promisified action?

As I mentioned above, adding a Promise<Instance> eval(bytecode, importObj) API is fine, but now it places the developer in a tough spot because now they have to choose between a perf boost on some platforms vs. being able to cache their compiled code on all platforms. It seems we need a solution that integrates with caching and that's what I was brainstorming above with the explicit Cache API.

lukewagner commented 8 years ago

New idea: what if we added an async version of new Instance, say WebAssembly.instantiate and, as with WebAssembly.compile, we say that everyone is supposed to use the async version? This is something I've been considering anyway since instantiation can take a few ms if patching is used. Then we say in the spec that the engine can do expensive work in either compile or instantiate (or neither, if an engine does lazy validation/compilation!).

That still leaves the question of what to do when a compiled Module is stored in IDB, but that's just a hard question when there are multiple codegen modes anyway. One idea is that Modules that are stored-to or retrieved-from IDB hold onto a handle to their IDB entry and they add new compiled code to this entry. In that way, the IDB entry would lazily accumulate one or more more compiled versions of its module and be able to provide whichever was needed during instantiation.

The IDB part is a bit more work, but that seems pretty close to ideal, performance-wise. WDYT?

jfbastien commented 8 years ago

I think adding async instantiate makes sense, but I'd also add a Memory parameter to compile. If pass a different memory to instantiate then you can get recompiled, otherwise you've already "bound" the memory when compiling.

I haven't thought about the caching enough to have a fully-formed opinion yet.

pizlonator commented 8 years ago

@lukewagner

I'm not an expert on ASLR but, iiuc, even if we didn't have a conservative bound (that is, if we allowed you to keep allocating until mmap failed because the kernel hit its number-of-address-ranges max), only small fraction of the entire 47-bit addressable space would be consumed so code placement would continue to be highly random over this 47-bit space. IIUC, ASLR code placement isn't completely random either; just enough to make it hard to predict where anything will be.

ASLR affects both code and data. The point is to make it more expensive for an attacker to weasel his way into a data structure without chasing a pointer to it. If the attacker can exhaust memory, he definitely has more leverage.

Well, since it's from asm.js days, only 1000. Then the memory allocation just throws. Maybe we'll need to bump this, but even with many super-modularized apps (with many seprate wasm modules each) sharing the same process, I can't imagine we'd need too much more. I think Memory is different than plain old ArrayBuffers in that apps won't naturally want to create thousands.

1000 seems like a sensible limit. I'll ask around with security folks.

As I mentioned above, adding a Promise eval(bytecode, importObj) API is fine, but now it places the developer in a tough spot because now they have to choose between a perf boost on some platforms vs. being able to cache their compiled code on all platforms. It seems we need a solution that integrates with caching and that's what I was brainstorming above with the explicit Cache API.

Right. I can see a few ways that such an API could be made to work. A cheesy but practical API would be to overload eval:

instancePromise = eval(bytecode, importObj)
instancePromise = eval(module, importObj)

and then Instance has a getter:

module = instance.module

Where module is structure cloneable.

What do you think of this?

New idea: what if we added an async version of new Instance, say WebAssembly.instantiate and, as with WebAssembly.compile, we say that everyone is supposed to use the async version? This is something I've been considering anyway since instantiation can take a few ms if patching is used. Then we say in the spec that the engine can do expensive work in either compile or instantiate (or neither, if an engine does lazy validation/compilation!).

That still leaves the question of what to do when a compiled Module is stored in IDB, but that's just a hard question when there are multiple codegen modes anyway. One idea is that Modules that are stored-to or retrieved-from IDB hold onto a handle to their IDB entry and they add new compiled code to this entry. In that way, the IDB entry would lazily accumulate one or more more compiled versions of its module and be able to provide whichever was needed during instantiation.

The IDB part is a bit more work, but that seems pretty close to ideal, performance-wise. WDYT?

Intriguing. Relative to my idea above:

Pro: yours is an easy to understand abstraction that is conceptually similar to what we say now. Con: yours does not lead to as much synergy between what the user does and what the engine does as my proposal allows.

There are three areas where your proposal doesn't give the user as much control as mine:

The expensive work could happen in one of two places, so the user has to plan for either of them being expensive. We will probably have web content that behaves badly if one of them is expensive, because it was tuned for cases where it happened to be cheap. My proposal has one place where expensive things happen, leading to more uniformity between implementations.
There's no clearly guaranteed path for all versions of the compiled code to be cached. On the other hand, my use of threading the module through the API means that the VM can build up the module with more stuff each time, while still allowing the user to manage the cache. So, if the first time around we do 4GB then this is what we will cache, but if we fail to do 4GB the second time, we will be able to potentially cache both (if the user caches instance.module after every compile).
Unusual corner cases in the browser or other issues could sometimes lead to a double compile in your scheme, because we'd compile one thing in the compile step but then realize we need another thing in the instantiation step. My version never requires a double compile.

So, I like mine better. That said, I think your proposal is a progression, so it definitely sounds good to me.

titzer commented 8 years ago

This issue rests upon how often fragmentation makes allocation of fast memory (btw you'll 4GB + maximum supported offset, or 8GB) fails. If the probably is way less than 1%, then it might not be entirely unreasonable to have that be an OOM situation.

In the scenario where the user is browsing around the web and using lots of little WASM modules in quick succession, presumably they aren't all live at once. In that case, a small cache of reserved 4GB chunks would mitigate the issue.

Another possible strategy is to generate one version of the code with bounds checks, and if fast memory is available, just overwrite the bounds checks with nops. That's ugly, but that's a heck of a lot faster than a recompile, and less space than two compiles.

On Thu, Oct 27, 2016 at 9:03 PM, pizlonator notifications@github.com wrote:

@mtrofin https://github.com/mtrofin

Module author must require 4GB as min/max memory

That's not really practical, since many devices don't have 4GB of physical memory. Also, that's hard to spec.

A variant of compile (async at least, maybe also sync) that produces an instance accepting only fast memory.

I don't think we want double compiles.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/design/issues/838#issuecomment-256738329, or mute the thread https://github.com/notifications/unsubscribe-auth/ALnq1F6CYUaq0unla0H6RYivUC8jfxIAks5q4PWIgaJpZM4Kh1gM .

jfbastien commented 8 years ago

It's not just ASLR: it's also pagetable / allocator / etc pollution. We all need to talk to our security folks as well as out kernel / systems folks. Or we can be up-front to developers about the limits each engine imposes on "fast" Memory, and make it idiomatic in the API so it's hard to use it wrong.

There's all these crutches we can use, such as nop or double compilation, but why even have crutches?

lukewagner commented 8 years ago

@jfbastien I don't think PROT_NONE memory costs page table entries; I think there is a separate data structure that holds mappings from which the page table is lazily populated.

@pizlonator I like that idea, and I can see that being what we encourage everyone to use by default in tutorials, toolchain, etc. It's also more succinct and easier to teach if you can simply ignore Module. This could also address @s3ththompson's concern about discouraging use of the sync APIs by making the nicest API the async one.

However, I think we shouldn't take away WebAssembly.compile and the Module constructor: I'm imagining scenarios where you have a "code server" (providing cross-origin code caching via IDB + postMessage; this has been specifically discussed with some users already) that wants to compile and cache code without having to "fake up" instantiation parameters. (There could also be some needless overhead (garbage, patching, etc) for an unnecessary instantiation.) And, for the same corner cases that want synchronous compilation (via new Module), we would need to keep new Instance.

So if agreed on that, then what this boils down to is a purely additive proposal of the two WebAssembly.eval overloads you mention. Yes?

One tweak, though: I think we shouldn't have a module getter since this would require the Instance to keep some internal data around (viz., bytecode) for the lifetime of the Instance; right now Module can usually be GCed immediately after instantiation. This would suggest either a data property (that the user can remove, although they will probably forget to), or maybe a third version of eval that returns an {instance, module} pair...

flagxor commented 8 years ago

Having an async one step API as the recommended case for the typical monolithic app makes sense as the recommended pattern.

Agreed with @lukewagner that both the all sync (inline compile) case covered by new Module + new Instance is useful. Also the background compile (async) server with sync instantiate also seems useful.

Adding the two eval variants proposed seem an ok way to introduce this.

However I don't like the name, because it will be conflated in (security) folks mind with js eval (which it resembles in one way, but not in terms of scope capture). How about WebAssembly.instantiate ?

lukewagner commented 8 years ago

Hah, good point, eval does have a bit of a rep. +1 to WebAssembly.instantiate.

mtrofin commented 8 years ago

What would the guideline to the developer be wrt when to use the async instantiate?

lukewagner commented 8 years ago

@mtrofin To use WebAssembly.instantiate by default unless they had some special code-sharing/loading scheme that required compiling Modules independently of any particular use.

pizlonator commented 8 years ago

@lukewagner This seems reasonable.

Hah, good point, eval does have a bit of a rep. +1 to WebAssembly.instantiate.

Agreed.

So if agreed on that, then what this boils down to is a purely additive proposal of the two WebAssembly.eval overloads you mention. Yes?

That's what it sounds like.

I think we shouldn't have a module getter since this would require the Instance to keep some internal data around (viz., bytecode) for the lifetime of the Instance; right now Module can usually be GCed immediately after instantiation. This would suggest either a data property (that the user can remove, although they will probably forget to), or maybe a third version of eval that returns an {instance, module} pair...

Sure feels like a data property is better. Or having WebAssembly.instantiate always return an instance, module pair.

mtrofin commented 8 years ago

Is this correct: Suppose you WebAssembly.instantiate with the goal of getting a fastmemory module variant. You now get the module, and structure-clone it. Now, this module is bound to needing to be instantiated with Memory-es supporting fastmemory.

lukewagner commented 8 years ago

@pizlonator Yeah, I can bikeshed it in my head multiple ways. I think I like returning the pair a little better since it'll probably lead to less people accidentally entraining an unused Module.

@mtrofin Recompilation can still be necessary when you pluck the Module off one instantiate call and instantiate with new imports; I think the point of this API addition is that it won't be the common case and it will only happen when it's fundamentally necessary (i.e., you have 1 module accessing two kinds of memories).

jfbastien commented 8 years ago

This thread is getting long, looks like it's converging but to be 100% sure we need to write up the code that we expect different users to write:

Async instantiation of a single module.
Async instantiation of a module, with memory sharing with other modules.
Synchronous instantiation of a single module (I don't think synchronous multi-module is useful?).
Caching for all of these (both putting into the cache, as well as retrieving and instantiating, with memory).
Update of a single .wasm module, and cached loads of the other modules.

Anything else? It sounds like @lukewagner has ideas around imports which I don't fully grok.

mtrofin commented 8 years ago

That means that subsequent uses of this module must instantiate asynchronously, or risk blocking the UI thread with a surprisingly lengthy synchronous instantiate.

mtrofin commented 8 years ago

@jfbastien I'd want to understand for each snippet we expect developers to write, what would motivate them to go that particular path, and what information the developer must have available to make a decision.

lukewagner commented 8 years ago

@mtrofin Right, given a Module m, you'd call WebAssembly.instantiate(m) which is async. You could call new Instance(m) and it might be expensive, but that's no different than new Module(m).

lukewagner commented 8 years ago

@jfbastien Assuming when you say "async instantiation" you mean "async compilation and instantiation", here's the short version:

WebAssembly.instantiate(bytecode, imports)
WebAssembly.instantiate(bytecode, imports), where imports includes the shared memory
new Instance(new Module(bytecode), imports)
In all cases you can get a Module, then you put that in an IDBObjectStore. Later, you get a Module m back and call WebAssembly.instantiate(m, imports).
Nothing really special here: you WebAssembly.instantiate one module from bytecode and instantiate the rest from the Modules pulled from IDB.

mtrofin commented 8 years ago

Should we recommend using the sync instantiate if you feel you can use the sync compile, and async instantiate if you feel you should use the async compile?

Aside that, I am concerned that the developer would now face a more complex system: more choices that transpire optimizations we're planning on making, and I'm not sure the developer has the right information available for making the tradeoffs. Thinking from the developer's perspective, is there a smaller set of concerns they care about and would feel comfortable expressing? We talked at a point about developers having an "optimize at the expense of precise failure points" (this was re. hoisting bounds checks). Would an alternative be an "optimize" flag?

lukewagner commented 8 years ago

@mtrofin 99% of what developers would write (or have generated for them by the toolchain) would be WebAssembly.instantiate. You'd only use sync APIs for special "I'm writing a JIT in wasm" and WebAssembly.compile if you're writing some code sharing system so I would think the "Getting Started" tutorials would exclusively cover WebAssembly.instantiate.

flagxor commented 8 years ago

@lukewagner I notice you added imports to #3 new Module() above. I think that plus adding it to WebAssembly.compile is a good idea and rounds out the possibilities. That way if you want to hint about the memory at compile time you can. If you later instantiate again with different imports, especially synchronously, you may get a hiccup.

So summary of changes (just so I'm clear):

Add WebAssembly.instantiate(bytes, imports) returns promise of {instance:, module:}
Add WebAssembly.instantiate(module, imports) returns promise of {instance:, module:}
Change to new Module(bytes, imports) returns module
Change to WebAssembly.compile(bytes, imports) returns promise of instance

State somewhere the expectation that instantiate will be fast if imports from compile match instantiate.

WDYT?

lukewagner commented 8 years ago

Oh oops, I meant to put the imports as an arg to Instance. I'm not convinced it's necessary for Module or compile. [Edit: because if you had them, you'd just call instantiate]

flagxor commented 8 years ago

So that would mean that for the end-to-end async case, you can know that you'll be binding to a 4GB hack memory, but not for a JITed filter kernel or a background compiled item (unless you also create a throw-away instance)?

mtrofin commented 8 years ago

+1 on focusing the guidance on the async pair of compile & instantiate - makes the message simple and hides the complexities of the decision problem from the developer.

flagxor commented 8 years ago

Yeah I think we're all in agreement that we'd point folks at: First time: WebAssembly.instantiate(bytes, imports) -> promise of {module, instance} (cache module to indexeddb) Second time: WebAssembly.instantiate(module, imports) -> promise of {module, instance}

Any objections to that being the main pattern?

I'm torn on imports for compile / new Module. It seems like that could be a useful hint. Though, I'd be open to mentioning it as a possibility and deferring adding that arg (it could be optional) to Post-MVP.

Thoughts?

lukewagner commented 8 years ago

@mtrofin (Well, technically, just instantiate.)

flagxor commented 8 years ago

@lukewagner (I think that's what @mtrofin meant)

mtrofin commented 8 years ago

@lukewagner, @flagxor OK, but we're keeping the async compile API, right?

How about this scenario: you get an application like PhotoShop with tons of plugins. Each plugin is a wasm module. You start the main app and you manage to allocate the magic memory size that triggers fastmemory (seems reasonable for this scenario - one app, memory hungry).

You want to compile a number of the plugins in parallel so you fire off some workers to do that. You can't pass those compilations the actual memory you'll use (correct?). So, depending on defaults, you get slowmemory compilation for the plugins, which will then be followed by a costly slew of async recompilations for the fastmemory when the plugins get connected to the app.

If we buy this scenario, then it feels there may be value in passing some memory descriptor (to be clear, without actual backing memory) to the compile API.

jfbastien commented 8 years ago

Yes, it should be possible (even encouraged) to pass Memory to the compilation.

lukewagner commented 8 years ago

@mtrofin Right, compile for advanced uses. I suppose that plugin example is a valid case where you'd want to compile, and you have a Memory, but you don't want instantiate (yet).

lukewagner commented 8 years ago

@pizlonator Btw, I meant to ask earlier, assuming the "throw if more than 1000 4gb maps per process" hack is sufficient to address the ASLR/security concerns, is there still a need for slow-mode/fast-mode due to platform virtual address quota restrictions? Because if there wasn't, it's certainly be nice if this wasn't even a performance consideration for even advanced users. (The instantiate APIs seem useful to add for the other reasons we mentioned, of course.)

ghost commented 8 years ago

There are also applications that might benefit from a memory size that is a power of two plus a spill area, where the application already masks pointers to remove tagging so can mask off the high bits to avoid bounds checking. These applications need to reason about the memory allocation size that they can receive before compilation, either by modifying global constants used for masking or to bake in suitable constants while de-compressing to wasm.

There is also the buffer-at-zero optimization that some runtimes might want to take advantage of, and there is only going to be one such buffer per process.

It would also make the platform more user friendly if it could reason about the required memory and available memory before compiling the application. For example, to allow the browser or app to inform the user that they need to close some tabs in order to run the application, or to run it without degraded capability.

A web browser might want to have a dedicated-app mode that is an option for users, where they might be running on a limited device and need all the memory and performance they can get just to run the one application well. For this it needs to be able to reason about the requirements early.

The memory should not need to be allocated before compilation, rather a reasoned reservation made. On a limited device, compilation alone might use a lot of memory, so even a large VM allocation might be a show stopper.

These are not new issues, have been under discussion for years now. Memory resource management is needed, and it needs to be coordinated with code generation.

pizlonator commented 8 years ago

@lukewagner I think so, because if we limited to 1000 modules, then I'd worry that the tolerances aren't big enough.

I'd worry about an attack surfacing that needed this ceiling to go down.
I'd worry about optimizations elsewhere in the stack reducing the amount of virtual address space that is available to us, which then would require us to reevaluate whether the ceiling is low enough.
I'd worry about preventing programming styles that deliberately create 1000s of modules. For example, I know that most JavaScriptCore framework clients create a VM, do a tiny bit of work, and then destroy it. If WebAssembly is used the same way from JS as JSC is from Objective-C, then to make it work on 64-bit systems the GC would have to know that if you allocate 1000 memories - even if each was small - then you have to GC in case the next allocation should succeed on the grounds that those 1000 memories are now unreachable. The ability to allocate non-4GB-hack memories after there are already, say, 10 4GB-hack memories live would mean that the GC wouldn't have to change its heuristics very much. It wouldn't have to do a GC when you allocate the 1001st module in your instantiate->run->die loop. This would be a benefit to patterns that use a tiny memory. Anything below 1MB, and it starts to make sense to have 1000 of them. I can imagine people doing useful things in 64KB.
I'd worry about this being less useful to other JavaScript contexts. I want to leave the door open for JSC C API and Objective-C API clients to have access to WebAssembly API from their JS code. Those clients would probably prefer a small limit on the number of 4GB memories we allocate. It's tempting to even make that quota configurable in such a context.

I like that the improved API removes the need to have an artificial ceiling on the number of memories, or the need to recompile, or other undesirable things. I don't like artificial ceilings unless the tolerances are very large, and I don't think that's the case here.

lukewagner commented 8 years ago

@pizlonator Fair enough, and it's a cleaner/simpler API, so I think it's fine to add.

As for why I'm not concerned with those items you've mentioned (at this time):

The limit may well need to be raised; that's easy.
At whatever reasonable limit, only a small fraction of the total 64-bit address space will be used, so I'm not aware of what that attack vector is here; determined content has many ways to OOM itself
We bump the GC heuristics commensurate with the reservation size, and thus churning through Memorys just leads to more aggressive GC. More GC isn't great, but I'm not sure this will be a common pattern.

But who knows what we'll see in the future, so I suppose it's useful to have the flexibility built in now.

WebAssembly / design

Can WebAssembly.Instance recompile an already-compiled Module? #838