Proposal: Fine grained control of memory

WebAssembly / design

WebAssembly Design Documents

http://webassembly.org

Apache License 2.0

11.42k stars 694 forks source link

Proposal: Fine grained control of memory #1439

Open dtig opened 2 years ago

dtig commented 2 years ago

The linear memory associated with a WebAssembly instance is a contiguous, byte addressable range of memory. In the MVP each module or instance can only have one memory associated with it, this memory at index zero is the default memory.

The need for finer grained control of memory has been in the cards since the early days of WebAssembly, and some functionality is also described in the future features document.

Motivation

The primary motivation for this proposal is to reduce the impact of copying into Wasm memory. In comparison with native applications, Web Codecs, and ML based applications in the browser context incur a non-negligible performance overhead due to the number of copies that need to be made across the pipeline (source: WebCodecs, WebGPU, Machine learning on the Web 1, 2).
Widely used WebAssembly engines currently reserve large chunks of memory upfront to ensure that memory.grow can be efficient, and we can obtain a large linear contiguous chunk of memory that can be grown in place. While this works in the more general cases, it can be a very difficult requirement for systems that are memory constrained. In some cases it is possible that applications don’t use the memory that an engine implicitly reserves, so having some way to explicitly tell the OS that some reserved memory can be released would mean that applications have better control over their memory requirements. This was also previously discussed as an addition to the MVP, and more recently as an option for better memory management.
Ensuring that some data can also be read-only is useful for applications that want to provide a read only API to inspect the memory, or for media use cases that want to disallow clients from manipulating buffers that the decoder might still be using. From a security perspective, read only memory will provide increased security guarantees for constant data, or better security guarantees for sensitive data which is currently absent due to all of the memory being write accessible.

Proposed changes

At a high level, this proposal aims to introduce the functionality of the instructions below:

memory.map: Provide the functionality of mmap(addr, length, PROT_READ|PROT_WRITE, MAP_FIXED, fd) on POSIX, and MapViewOfFile on Windows with access FILE_MAP_READ/FILE_MAP_WRITE.
memory.unmap: Provide the functionality of POSIX munmap(addr, length), and UnmapViewOfFile(lpBaseAddress) on Windows.
memory.protect: Provide the functionality of mprotect with PROT_READ/PROT_WRITE permissions, and VirtualProtect on Windows with memory protection constants PAGE_READONLY and PAGE_READWRITE.
memory.discard: Provide the functionality of madvise(MADV_DONTNEED) and VirtualFree(MEM_DECOMMIT);VirtualAlloc(MEM_COMMIT) on windows.

Some options for next steps are outlined below, the instruction semantics will depend on the option. The intent is to pick the option that introduces the least overhead of mapping external memory into the Wasm memory space. Both the options below below assume that additional memories apart from the default memory will be available. The current proposal will only introduce memory.discard to work on the default memory, the other three instructions will only operate on memory not at index zero.

Option 1: Statically declared memories, with bind/unbind APIs (preferred)

Extend the multi-memory proposal with a JS API that enables access to memories other than the default memory.
The instructions outlined above will take a static argument for memory index.
Introduce new JS API views to bind/unbind JSArrays that call memory.map/memory.unmap underneath. (Note: it may be possible for some browser engines to operate on the same backing store without an explicit map/unmap instruction. If the only usecase for these instructions is from JS, it is possible to make these API only as needed.)
Extend memtype to store memory protections in addition to limits for size ranges.

Reasons for preferring this approach:

Having a statically known number of memories ahead of time may be useful for optimizing engine implementations
From looking at applications, it looks like applications do not require a large number of additional memories, and having a single digit number of extra memories may be sufficient for most cases. (This is a limited survey of applications, if there are more that would benefit from fully first class memories please respond here, or send them my way.)
Incremental addition over existing proposals.

Option 2: First class WebAssembly memories

This is the more elegant approach to dynamically add memories, but adding support for first class memories is non-trivial.

Introduce the notion of a generic memory ref ref.mem.
Introduce a new class of instructions to add, remove and manipulate memory references.
Extend existing instructions that take a memarg to use memory references.
The instructions outlined above will need an argument for a memory reference.
JS API extensions for the instructions mentioned above.

Other alternatives

Why not just map/unmap to the single linear memory, or memory(0)?

I'm not sure that this can be done in any way that can still be compatible with the performance guarantees for the current memory. At minimum, I expect that more memory accesses would need to be bounds checked, and write protections would also add extra overhead.
Generalizing what this would need to look like, we need to store granular page level details for the memory which complicates the engine implementations, especially because engines currently assume that Wasm owns the default memory, and have tricks in place to make this work in a performant and secure way (the use of guard pages for example).
To maintain backwards compatibility to the extent that the default Wasm memory space is unaffected.

Web API extensions

To support WebAssembly owning the memory, and also achieving zero copy data transfer, is to extend Web APIs to take typed array views as input parameters into which outputs are written. The advantage here is that the set of APIs that need this can be scaled incrementally with time, and it minimizes the changes to the WebAssembly spec.

The disadvantages are that this would require changes to multiple Web APIs across different standards organizations, it’s not clear that the churn here would result in providing a better data transfer story as some APIs will still need to copy out.

This is summarizing a discussion from the previous issue in which this approach was discussed in more detail.

Using GC Arrays

Though the proposal is still in phase 1, it is very probable that ArrayBuffers will be passed back and forth between JS/Wasm. Currently this proposal is not making assumptions about functionality that is not already available, and when available will evaluate what overhead it introduces with benchmarks. If at that time the mapping functionality is provided by the GC proposal without much overhead, and it makes sense to introduce a dependency on the GC proposal, this proposal will be scoped to the remaining functionality outlined above.

JS API

Interaction of this proposal with JS is somewhat tricky because

WebAssembly memory can be exported as an ArrayBuffer, or a SharedArrayBuffer if the memory is shared, but ArrayBuffers do not have the notion of read protections for the ArrayBuffer. There are proposals in flight that explore this, and when this is standardized in JS, WebAssembly memory that is read-only either by using a map-for-read mapping or, protected to read-only can be exposed to JS. There are currently proposals in flight that explore these restricted ArrayBuffers. (1, 2)
Multiple ArrayBuffers cannot alias the same backing store unless a SharedArrayBuffer is being used. One option would be for the BufferObject to return the reference to the existing JS ArrayBuffer. Alternatively a restriction that could be imposed to only use SharedArrayBuffers when mapping memory, but this also would has trickle down effects into Web APIs.
Detailed investigation needed into whether growing memory is feasible for memory mapped buffers. What restrictions should be in place when interacting with resizeable/non-resizeable buffers?

Open questions

Consistent implementation across platforms

The functions provided above only include Windows 8+ details. Chrome still supports Windows 7 for critical security issues, but only until Jan 2023, this proposal for now will only focus on windows system calls available on Windows 8+ for now. Any considerations of older Windows users will depend on usage stats of the interested engines.

How would this work in the tools?

While dynamically adding/removing memories is a key use case, for C/C++/Rust programs operate in a single address space, and library code assumes that it has full access to the single address space, and can access any memory. With multiple memories, we are introducing separate address spaces so it’s not clear what overhead we would be introducing.

Similarly, read-only memory is not easy to differentiate in the current model when all the data is in a single read-write memory.

How does this work in the presence of multiple threads?

In applications that use multiple threads, what calls are guaranteed to be atomic? On the JS side, what guarantees can we provide for Typed array views?

Feedback requested

All feedback is welcome, but specific feedback that I would find useful for this issue:

The use cases detailed in the motivation section are the ones that I will be currently focusing on. If there are other use cases that would benefit from a proposal along these lines I’d be interested to evaluate them as well.
At the moment I’ve added high level details for Option 1, Option 2 and will continue to evaluate them in more depth, but I would be interested in feedback on both the options as well as thoughts on the alternate options mentioned above.

Repository link here if filing issues is more convenient.

titzer commented 2 years ago

I really like this proposal and I am glad it is happening now! 👍

One additional use case I can think of is to implement the equivalent of .ro sections in ELF, which are read-only data. We could consider an extension to active memory segments and an extension to memory declarations to declare read-only ranges and segments to be loaded into those read-only ranges, prior to the start function, so that memory is not ever observably uninitialized.

conrad-watt commented 2 years ago

How does this work in the presence of multiple threads?

In applications that use multiple threads, what calls are guaranteed to be atomic? On the JS side, what guarantees can we provide for Typed array views?

AFAIK these operations (if implemented via POSIX) can't be guaranteed to be atomic unless we're willing to do something like pause/interrupt every other thread (which can access the memory) while carrying them out. My understanding is that the POSIX spec just says that races here have undefined behaviour.

If stopping the world isn't acceptable, we might be able to get away with something similar to our current memory.grow semantics in the case of a race, where a thread's individual memory accesses may each non-deterministically observe/not observe any concurrent (racy) mmap/mprotect, unless there is some other synchronisation (e.g. through a paired atomic read-write) which fixes whether the operation is visible or not. This is beyond what the POSIX spec guarantees, but might be satisfied by real OS behaviours (and is probably good enough for real user programs). AFAIU this is a rather underexplored area semantically.

dtig commented 2 years ago

I really like this proposal and I am glad it is happening now! 👍

One additional use case I can think of is to implement the equivalent of .ro sections in ELF, which are read-only data. We could consider an extension to active memory segments and an extension to memory declarations to declare read-only ranges and segments to be loaded into those read-only ranges, prior to the start function, so that memory is not ever observably uninitialized.

Thanks @titzer! Interesting use case.

If stopping the world isn't acceptable, we might be able to get away with something similar to our current memory.grow semantics in the case of a race, where a thread's individual memory accesses may each non-deterministically observe/not observe any concurrent (racy) mmap/mprotect, unless there is some other synchronisation (e.g. through a paired atomic read-write) which fixes whether the operation is visible or not. This is beyond what the POSIX spec guarantees, but might be satisfied by real OS behaviours (and is probably good enough for real user programs). AFAIU this is a rather underexplored area semantically.

I'm hoping that we will be able to get away with the current memory.grow semantics. ASFAIK, we haven't encountered issues in the wild with racy grow calls, though I expect that it is more observable with an mmap call.

fitzgen commented 2 years ago

Exciting!

memory.map: Provide the functionality of mmap(addr, length, PROT_READ|PROT_WRITE, MAP_FIXED, fd) on POSIX, and MapViewOfFile on Windows with access FILE_MAP_READ/FILE_MAP_WRITE.

What are you imagining the operands to the memory.map instruction would be? Core Wasm doesn't have file descriptors or handles, but WASI and Web APIs do have analogous concepts, so this file-mapping functionality seems more appropriate for WASI and/or Web APIs than for core Wasm instructions in my mind.

memory.discard: Provide the functionality of madvise(MADV_DONTNEED) and VirtualFree(MEM_DECOMMIT);VirtualAlloc(MEM_COMMIT) on windows.

To be clear, the intended semantics for memory.discard is to zero the given memory region, correct?

I ask only because accessing pages after madvise(MADV_DONTNEED) doesn't always give zero pages: if the memory region is a shared mapping to an underlying file, then subsequent accesses will repopulate pages from that underlying file instead of using zero pages. It isn't 100% clear to me whether it is intended for memory.discard to have this behavior as well.

Is the expectation that Wasm engines running in environments without virtual memory will simply not implement or disable this proposal?

Just double checking: these instructions would all require that the memory regions they operate upon be page aligned and multiple-of-page-size sized, right?

I suppose they could take their operands in units of pages, rather than bytes, to enforce this, similar to memory.grow.

Overall:

I like the idea of exposing virtual memory and protection functionality via memory.protect and memory.discard instructions in core Wasm.
I think that the file mapping functionality should probably be built on top of the new core Wasm virtual memory functionality in either the WASI and/or Web API layers (instead of inside core Wasm; i.e. there should not be memory.{map,unmap} core Wasm instructions).
I like "option 1" of having static memory immediates for the new instructions, rather than introducing memory references. It is easier to start with, and we can always introduce memory references and indirect versions of these new instructions (and loads/stores/memory.copy/etc) that operate on memory references at a later time, if needed, which would be analogous to how we have both call and call_indirect instructions.

titzer commented 2 years ago

I generally agree with @fitzgen here. I think we should put first-class memories into their own proposal; we'll have to design a way to allow all memory operations that currently have static memory indexes to take a first class memory, and that mechanism should probably be uniform.

I also agree that file mapping is probably best handled at a different layer, so I think it may be out of scope here too.

dtig commented 2 years ago

Exciting!

memory.map: Provide the functionality of mmap(addr, length, PROT_READ|PROT_WRITE, MAP_FIXED, fd) on POSIX, and MapViewOfFile on Windows with access FILE_MAP_READ/FILE_MAP_WRITE.

What are you imagining the operands to the memory.map instruction would be? Core Wasm doesn't have file descriptors or handles, but WASI and Web APIs do have analogous concepts, so this file-mapping functionality seems more appropriate for WASI and/or Web APIs than for core Wasm instructions in my mind.

Thanks @fitzgen for the detailed feedback.

I'll start with an example here to clearly scope the problem that I'd like to tackle. Let's say WebGPU maps a GPUBuffer that produces an ArrayBuffer, or an ArrayBuffer is populated as the result of using file handling APIs like Blob.arrayBuffer()/FileReader.readAsArrayBuffer(), the contents of this ArrayBuffer need to be directly accessible to a Wasm module to avoid copying in/out of the Wasm linear memory.

While I also agree that file descriptors are out of place here, I don't necessarily agree that a map instruction is out of place as a core Wasm instruction. In my mental model, I expect that if it is possible for a Wasm module to have additional memory to operate on, that action should be explicit in the form of a core instruction, i.e. a file mapping API at a different layer would still need a core Wasm instruction that would be called. I'm having trouble thinking through how this would work if the functionality was only provided at a different layer. The linear memory still needs to be defined in a module, or imported into a module, how would this be accessible inside Wasm?

I think the operands to memory.map (just in the context of Option 2) would be as follows:

index: which specifies the memory index
pages: length in number of pages
prot: Bit field for write protections, this may be extended to include whether a memory can be grown.
addr: Pointer to the backing store of an ArrayBuffer for the above example (Not the best name because it confuses mmap arguments. I'm also unfamiliar with what would work for WASI in this case, but happy to look into it more to generalize this better)

memory.discard: Provide the functionality of madvise(MADV_DONTNEED) and VirtualFree(MEM_DECOMMIT);VirtualAlloc(MEM_COMMIT) on windows.

To be clear, the intended semantics for memory.discard is to zero the given memory region, correct?

I ask only because accessing pages after madvise(MADV_DONTNEED) doesn't always give zero pages: if the memory region is a shared mapping to an underlying file, then subsequent accesses will repopulate pages from that underlying file instead of using zero pages. It isn't 100% clear to me whether it is intended for memory.discard to have this behavior as well.

The intended behavior is to zero the memory pages, I'll look into potential options some more.

Is the expectation that Wasm engines running in environments without virtual memory will simply not implement or disable this proposal?

Yes, though I expect that it would be possible to polyfill if needed. I'm not sure that that would be particularly useful.

Just double checking: these instructions would all require that the memory regions they operate upon be page aligned and multiple-of-page-size sized, right?

I suppose they could take their operands in units of pages, rather than bytes, to enforce this, similar to memory.grow.

Yes, my expectation is that all operands are in units of pages consistent with the memory.grow operation.

Overall:

I like the idea of exposing virtual memory and protection functionality via memory.protect and memory.discard instructions in core Wasm.

I think that the file mapping functionality should probably be built on top of the new core Wasm virtual memory functionality in either the WASI and/or Web API layers (instead of inside core Wasm; i.e. there should not be memory.{map,unmap} core Wasm instructions).

I like "option 1" of having static memory immediates for the new instructions, rather than introducing memory references. It is easier to start with, and we can always introduce memory references and indirect versions of these new instructions (and loads/stores/memory.copy/etc) that operate on memory references at a later time, if needed, which would be analogous to how we have both call and call_indirect instructions.

fitzgen commented 2 years ago

I'll start with an example here to clearly scope the problem that I'd like to tackle. Let's say WebGPU maps a GPUBuffer that produces an ArrayBuffer, or an ArrayBuffer is populated as the result of using file handling APIs like Blob.arrayBuffer()/FileReader.readAsArrayBuffer(), the contents of this ArrayBuffer need to be directly accessible to a Wasm module to avoid copying in/out of the Wasm linear memory.

Agreed that this use case is very motivating.

I think the operands to memory.map (just in the context of Option 2) would be as follows:

* `index`: which specifies the memory index

* `pages`: length in number of pages

* `prot`: Bit field for write protections, this may be extended to include whether a memory can be grown.

* `addr`: Pointer to the backing store of an ArrayBuffer for the above example (Not the best name because it confuses `mmap` arguments. I'm also unfamiliar with what would work for WASI in this case, but happy to look into it more to generalize this better)

What is the representation of a pointer to the backing store of an ArrayBuffer here? Is it an externref (or some other kind of reference) that JS passes in? Is it an integer indexing into some table maintained on the JS side of things? How does core Wasm get/create one?

It seems to me like this API/functionality fundamentally involves communicating with, and making assumptions about, the host. Therefore this belongs in WASI/Web APIs, not core Wasm, in my mind.

I'm having trouble thinking through how this would work if the functionality was only provided at a different layer. The linear memory still needs to be defined in a module, or imported into a module, how would this be accessible inside Wasm?

What I am imagining is that there would be a JS API basically identical to what you've described for the memory.map instruction, but because it is a JS API it can just take an ArrayBuffer as its addr/file descriptor/handle argument directly and side step the questions raised above.

Something like this:

// Grab the Wasm memory.
let memory = myWasmMemory();

// Grab the array buffer we want to share with Wasm.
let buffer = myArrayBuffer();

// Length of the buffer, in Wasm pages.
let page_len = Math.ceil(buffer.length / 65536);

// The memory protections.
let prot = WebAssembly.Memory.PROT_READ | WebAssembly.Memory.PROT_WRITE;

// Map the array buffer into this memory!
memory.map(page_len, prot, buffer);

Then, if you wanted to create a new mapping from inside Wasm, you would import a function that allowed you to have your own scheme for identifying which array buffer you wanted to map (maybe coming up with your own "file descriptor" concept, since you can safely make assumptions about the host and include your own JS glue to maintain the fd-to-ArrayBuffer mapping on the JS side).

The linear memory would still be defined inside Wasm, as if it were just another memory. And it would be just another memory, until the JS API was called on it and the array buffer got mapped in.

There could be an analogous API for WASI. (Although, at the risk of going into the weeds a little bit here, one of WASI's goals is for all APIs to be virtualizable, and this API wouldn't be. Making it virtualizable would require a memory.map instruction were you could overlay views of an existing memory onto another memory. That is a bit more powerful than anything we've been talking about in this thread thus far.)

dschuff commented 2 years ago

This was also previously discussed as an addition to the MVP, and more recently as an option for better memory management.

The 2 links you used there are the same URL. Did you mean for the latter one to be https://github.com/WebAssembly/design/issues/1397 ?

dtig commented 2 years ago

I'll start with an example here to clearly scope the problem that I'd like to tackle. Let's say WebGPU maps a GPUBuffer that produces an ArrayBuffer, or an ArrayBuffer is populated as the result of using file handling APIs like Blob.arrayBuffer()/FileReader.readAsArrayBuffer(), the contents of this ArrayBuffer need to be directly accessible to a Wasm module to avoid copying in/out of the Wasm linear memory.

Agreed that this use case is very motivating.
I think the operands to memory.map (just in the context of Option 2) would be as follows:
* `index`: which specifies the memory index

* `pages`: length in number of pages

* `prot`: Bit field for write protections, this may be extended to include whether a memory can be grown.

* `addr`: Pointer to the backing store of an ArrayBuffer for the above example (Not the best name because it confuses `mmap` arguments. I'm also unfamiliar with what would work for WASI in this case, but happy to look into it more to generalize this better)
What is the representation of a pointer to the backing store of an ArrayBuffer here? Is it an externref (or some other kind of reference) that JS passes in? Is it an integer indexing into some table maintained on the JS side of things? How does core Wasm get/create one?

It seems to me like this API/functionality fundamentally involves communicating with, and making assumptions about, the host. Therefore this belongs in WASI/Web APIs, not core Wasm, in my mind.

For Option 1, I expect this to be an externref, for Option 1, though this is more flexible, i.e. if we did introduce the concept of a generic memoryref, then I expect that there would be a table on the Wasm side, and we would need additional instructions to manipulate memory references.

I'm having trouble thinking through how this would work if the functionality was only provided at a different layer. The linear memory still needs to be defined in a module, or imported into a module, how would this be accessible inside Wasm?

What I am imagining is that there would be a JS API basically identical to what you've described for the memory.map instruction, but because it is a JS API it can just take an ArrayBuffer as its addr/file descriptor/handle argument directly and side step the questions raised above.

Something like this:
// Grab the Wasm memory.
let memory = myWasmMemory();

// Grab the array buffer we want to share with Wasm.
let buffer = myArrayBuffer();

// Length of the buffer, in Wasm pages.
let page_len = Math.ceil(buffer.length / 65536);

// The memory protections.
let prot = WebAssembly.Memory.PROT_READ | WebAssembly.Memory.PROT_WRITE;

// Map the array buffer into this memory!
memory.map(page_len, prot, buffer);
Then, if you wanted to create a new mapping from inside Wasm, you would import a function that allowed you to have your own scheme for identifying which array buffer you wanted to map (maybe coming up with your own "file descriptor" concept, since you can safely make assumptions about the host and include your own JS glue to maintain the fd-to-ArrayBuffer mapping on the JS side).

The linear memory would still be defined inside Wasm, as if it were just another memory. And it would be just another memory, until the JS API was called on it and the array buffer got mapped in.

There could be an analogous API for WASI. (Although, at the risk of going into the weeds a little bit here, one of WASI's goals is for all APIs to be virtualizable, and this API wouldn't be. Making it virtualizable would require a memory.map instruction were you could overlay views of an existing memory onto another memory. That is a bit more powerful than anything we've been talking about in this thread thus far.)

Ah, I see what you mean. My intent with proposing core Wasm instructions for map/unmap was to see if there's a way to make the module having access to additional memory more explicit, instead of implicit through the API. But I do agree with you that trying to do so does make assumptions about the host environment. If the current use is limited to the JS/Web use case of being able to map ArrayBuffers in , I would not be opposed to starting with a API-only function, and revisit the addition of core Wasm instructions if needed (I touch on this in the 3rd bullet point of Option 1, but on re-reading I realize that doesn't provide sufficient detail).

dtig commented 2 years ago

This was also previously discussed as an addition to the MVP, and more recently as an option for better memory management.

The 2 links you used there are the same URL. Did you mean for the latter one to be #1397 ?

I did! Thanks for catching, I've updated the OP.

conrad-watt commented 2 years ago

Ah, I see what you mean. My intent with proposing core Wasm instructions for map/unmap was to see if there's a way to make the module having access to additional memory more explicit, instead of implicit through the API. But I do agree with you that trying to do so does make assumptions about the host environment. If the current use is limited to the JS/Web use case of being able to map ArrayBuffers in , I would not be opposed to starting with a API-only function, and revisit the addition of core Wasm instructions if needed

There could be some analogy here to the way we currently think about thread creation. The core Wasm spec could describe how instructions interact with a "mapped memory" (cf. "shared memory"), without specifying core Wasm instructions for creating/altering the mapping (at least as an MVP). Web environments would want a host API to create mappings to ArrayBuffer, while non-Web environments might want a host API that creates mappings based on (e.g.) WASI file handles. So even if the current use-cases aren't restricted to just JS/the Web, an API-first approach could be viable.

fitzgen commented 2 years ago

There could be some analogy here to the way we currently think about thread creation. The core Wasm spec could describe how instructions interact with a "mapped memory", without specifying core Wasm instructions for creating/altering the mapping (at least as an MVP). Web environments would want a host API to create mappings to ArrayBuffer, while non-Web environments might want a host API that creates mappings based on (e.g.) WASI file handles. So even if the current use-cases aren't restricted to just JS/the Web, an API-first approach could be viable.

Yes, exactly. Thank you for stating this so succinctly!

aardappel commented 2 years ago

Somewhat related: discussion on address space related features in Memory64: https://github.com/WebAssembly/memory64/issues/4

I almost certainly do not understand the limitations that browsers are subject to w.r.t. memories that make it necessary to implement mmap functionality in terms of multi-memory (as opposed to being addressable by a single linear memory pointer), but I do feel this is unfortunate. I foresee lots of use cases in programming languages and other systems that would not work without a single address space, or without languages like C/C++ being able to use regular pointers to address all of it.

And if languages like C/C++ can't natively write to it but would need intrinsics/library functions to emulate access to a secondary memory (which would not allow reuse of buffer creation code in those languages), then there would be no use implement it with multi-memory underneath. Likely code in those languages would need to copy things anyway, in which case a memcpy with an extra memory argument would suffice.

Generalizing what this would need to look like, we need to store granular page level details for the memory which complicates the engine implementations

Why would that be required? To me, the biggest issue with features indicated in the above discussion would be what happens if the system is unable to commit a a page (assuming they were reserved without guaranteeing physical/page file availability). But assuming that can be solved, actual access should be possible with existing load/store ops without further information?

titzer commented 2 years ago

To me, the biggest issue with features indicated in the above discussion would be what happens if the system is unable to commit a a page (assuming they were reserved without guaranteeing physical/page file availability).

AFAICT this can already happen with a large memory.grow operation on most engines, which typically reserve (32-bit) memories and change protections upon grow. The underlying OS demand-pages these mappings and technically could go OOM on any memory access, even on pages that were previously mapped if it's swapped them to disk and memory is no longer available.

lars-t-hansen commented 2 years ago

@aardappel

I almost certainly do not understand the limitations that browsers are subject to w.r.t. memories that make it necessary to implement mmap functionality in terms of multi-memory (as opposed to being addressable by a single linear memory pointer),

Although I do see that there are some implementation challenges I basically agree with this, and I think we should explore the design and implementation spaces for the VM functions in the context of memory zero before assuming that it is absolutely necessary to go multi-memory.

Multi-memory has uncertain utility in unified address space languages, and the present proposal seems even more oriented toward the classical languages that are the most tied to unified address spaces than is the multi-memory proposal itself. For the present proposal there is therefore a heavy burden on the champions to demonstrate that tools that people will want to use can be applied effectively in a multi-memory setting.

conrad-watt commented 2 years ago

IIUC the proposal to forbid these operations on the default memory was motivated by a desire to avoid impacting the performance of modules not using these features. Could this instead be accomplished by making a type-level distinction between "mappable" and "non-mappable" memories (again, akin to the current distinction between "shared" and "non-shared")?

In this case, there would be no issue with declaring the default memory of newly-generated modules as "mappable" if required, although there might be some compositionality issues with previously-generated modules.

lars-t-hansen commented 2 years ago

Certainly an attribute could be made to work to control the code emitted. (It would be nice to avoid it if we can, though, and that comes back to my point about exploring the implementation space after pinning down in some detail the use cases and usage patterns.)

dtig commented 2 years ago

Why would that be required?

Currently the linear space is homogenous, but if we were to allow mapping/protection changes into linear memory that would no longer be the case. If we did spec memory operations for default memory, I would expect them to operate on page boundaries. This means that once adjacent pages can now be mapped/read-only pages. There is possibly a design space where we could declare upfront for some section of memory to be 'mappable', and then we wouldn't need to work at page granularity, but would that be sufficiently useful?

AFAICT this can already happen with a large memory.grow operation on most engines, which typically reserve (32-bit) memories and change protections upon grow. The underlying OS demand-pages these mappings and technically could go OOM on any memory access, even on pages that were previously mapped if it's swapped them to disk and memory is no longer available.

This is true, but there is a clear signal when to expect OOM on memory accesses, i.e. when a grow fails. The map + unmap case is different though, that memory accesses that previously were successful, would fail after an unmap, and if we were to allow mapping anywhere in the linear address space, that there can be an inaccessible chunk of memory in the middle of a JS ArrayBuffer seems to be too low level a detail to expose.

Aside from this, some other practical challenges would be

ASFAIK, it's not possible for multiple ArrayBuffers to alias the same backing store, unless the buffer is a SAB, and that would also mean some churn to existing WebAPIs. (Another orthogonal point is the current requirement of COOP/COEP headers to use SABs is limiting especially when they are not strictly necessary).
If certain portions of memory had different protections, I would expect that we would need to disallow the memory.buffer getter and expose a different API That surfaces slices of the memory as different ArrayBuffers. Using multiple memories makes this somewhat easier if we could change protections at memory granularity for example.
One other implementation challenge is it's not clear to me what should happen on multiple map/unmap calls with buffers that are backed by linear memory. In general, I'm not convinced that reserving a large chunk of memory upfront, but then providing functionality that can render chunks in the same space inaccessible is a robust approach.

Certainly an attribute could be made to work to control the code emitted. (It would be nice to avoid it if we can, though, and that comes back to my point about exploring the implementation space after pinning down in some detail the use cases and usage patterns.)

I'm currently working on gathering usage patterns, and I agree that that would influence the implementation space the most.

lukewagner commented 2 years ago

Adding my take on this problem space after getting to chat with @lars-t-hansen a bit:

From my understanding of the shape of the necessary clang/llvm extensions that would allow C/C++/Rust to operate on non-default memories, I can only imagine it working on C/C++/Rust code that was carefully (re)written to use the new extensions -- I'm not aware of any automatic techniques for porting large swaths of code that isn't just the shared-nothing approach of the component model (where you copy at boundaries between separate modules which each individually use distinct single-memories; to wit, wasm-link polyfills module-linking+interface-types using multi-memory in exactly this manner). Thus, I think there's still a certain burden of proof to show that there is real demand for additional multi-memory-based features.

Independently, I think we can make great progress in the short-term improving the functionality of default linear memories. In particular, I can see each of the following 3 features allowing optimized implementations on most browsers today with a graceful fallback path when no OS/MMU support is available:

memory.discard, as already discussed above. Graceful fallback to memset(0).
A new optional immediate on memtype declaring a trap-on-access low region of memory (either fixed to 1 wasm page or configurable), enabling reliable trap-on-null. In the absence of MMU support, an engine can implement this by simply performing an unsigned subtraction of the static size of the inaccessible region (such that wraparound causes the subsequent bounds check to fail). The corresponding JS API memory.buffer ArrayBuffer can be specified to alias only the accessible region (which does mean all pointer indices into it need to be offset... but I think that's probably the right tradeoff).
A new set of primitives to enable Copy-On-Write mapping of immutable byte-buffers (such as File, Blob and ImageBitmap on the Web platform). As a sketch: there could be a new bufferref reference type (passed in from the host), along with buffer.map and buffer.unmap operations. Semantically, buffer.map copies a subrange of a bufferref into linear memory at a given offset, returning a handle to a new "mapping" of type mappingref, and buffer.unmap takes a mappingref and zeroes out the previously-mapped region. The point is that buffer.map can be implemented via mmap(MAP_FIXED|MAP_PRIVATE) and buffer.unmap via madvise(MADV_DONTNEED). The immutability is critical for ensuring copy semantics since mmap is lazy. (Windows-knowing folks may worry about the absence of a MAP_FIXED equivalent in VirtualAlloc and the consequent race condition if buffer.map performs VirtualFree followed by MapViewOfFile and another thread in the same process VirtualAllocs into the hole -- I bugged our Chakra colleagues about this relentlessly back in the day until they got the kernel team to add the PLACEHOLDER flags to VirtualAlloc2 (available in Windows 10).)

Lastly, outside of Core WebAssembly, but for completeness: to minimize copies of non-immutable-Blob-like things, I think we should extend ReadableStreamBYOBReader to additionally accept [Shared] Uint8Arrays that are not detached, but, rather, racily written into from host threads (as previously proposed). This would allow streams (produced by WebTransport, WebCodec, WebRTC, ...) to quite efficiently emplace data into wasm linear memory. In theory, with this design, the one necessary copy from kernel space into user space can be used to write directly into linear memory. (Note that, anticipating this specialized use case of shared memory, while postMessage(SharedArrayBuffer) is gated by COOP/COEP, new WebAssembly.Memory({shared:true}) is not, and thus this extension could be used unconditionally on the Web platform.) In a browser-independent setting, the Interface Types stream type constructor we're iterating on should allow analogous optimizations, and bind to WHATWG streams in the JS API in terms of ReadableStreamBYOBReader.

Together, I think these 4 features would address a decent amount of the use cases for mmap/mprotect without incurring the portability/safety challenges of the fully-general versions of these features in default linear memory or the adoption challenges with multi-memory.

mykmartin commented 2 years ago

Why not just map/unmap to the single linear memory, or memory(0)? ... At minimum, I expect that more memory accesses would need to be bounds checked, and write protections would also add extra overhead.

Why would there need to be any additional bounds checking? If a mapped region is overlaid on the linear memory, the wasm code could just use regular memory ops with the standard linear bounds checks.

Regarding the overhead, access protections would be handled by the VMM hardware. Given the the process is almost certainly already going to be operating through VMM translations there should be little to no performance impact.

munrocket commented 2 years ago

Thank you for creating this proposal. This was a major problem in MVP.

dtig commented 2 years ago

Thanks @lukewagner for sketching this out, this is helpful.

Adding my take on this problem space after getting to chat with @lars-t-hansen a bit:

From my understanding of the shape of the necessary clang/llvm extensions that would allow C/C++/Rust to operate on non-default memories, I can only imagine it working on C/C++/Rust code that was carefully (re)written to use the new extensions -- I'm not aware of any automatic techniques for porting large swaths of code that isn't just the shared-nothing approach of the component model (where you copy at boundaries between separate modules which each individually use distinct single-memories; to wit, wasm-link polyfills module-linking+interface-types using multi-memory in exactly this manner). Thus, I think there's still a certain burden of proof to show that there is real demand for additional multi-memory-based features.

Independently, I think we can make great progress in the short-term improving the functionality of default linear memories. In particular, I can see each of the following 3 features allowing optimized implementations on most browsers today with a graceful fallback path when no OS/MMU support is available:

memory.discard, as already discussed above. Graceful fallback to memset(0).

A new optional immediate on memtype declaring a trap-on-access low region of memory (either fixed to 1 wasm page or configurable), enabling reliable trap-on-null. In the absence of MMU support, an engine can implement this by simply performing an unsigned subtraction of the static size of the inaccessible region (such that wraparound causes the subsequent bounds check to fail). The corresponding JS API memory.buffer ArrayBuffer can be specified to alias only the accessible region (which does mean all pointer indices into it need to be offset... but I think that's probably the right tradeoff).

A new set of primitives to enable Copy-On-Write mapping of immutable byte-buffers (such as File, Blob and ImageBitmap on the Web platform). As a sketch: there could be a new bufferref reference type (passed in from the host), along with buffer.map and buffer.unmap operations. Semantically, buffer.map copies a subrange of a bufferref into linear memory at a given offset, returning a handle to a new "mapping" of type mappingref, and buffer.unmap takes a mappingref and zeroes out the previously-mapped region. The point is that buffer.map can be implemented via mmap(MAP_FIXED|MAP_PRIVATE) and buffer.unmap via madvise(MADV_DONTNEED). The immutability is critical for ensuring copy semantics since mmap is lazy. (Windows-knowing folks may worry about the absence of a MAP_FIXED equivalent in VirtualAlloc and the consequent race condition if buffer.map performs VirtualFree followed by MapViewOfFile and another thread in the same process VirtualAllocs into the hole -- I bugged our Chakra colleagues about this relentlessly back in the day until they got the kernel team to add the PLACEHOLDER flags to VirtualAlloc2 (available in Windows 10).)

Could you elaborate on how multiple mappings would work? I'm also thinking about what would happen when after unmapping one external buffer, but a different buffer now needs to be mapped in. One of the concerns I had was depending on the sizes of the buffers that we need, if unmapping makes regions of the existing memory inaccessible, then subsequent buffer.unmap operations leaves larger and larger chunks of the linear memory inaccessible. Or does this approach sidestep that problem by using madvise(MADV_DONTNEED) because the memory is not then inaccessible by default?

Lastly, outside of Core WebAssembly, but for completeness: to minimize copies of non-immutable-Blob-like things, I think we should extend ReadableStreamBYOBReader to additionally accept [Shared] Uint8Arrays that are not detached, but, rather, racily written into from host threads (as previously proposed). This would allow streams (produced by WebTransport, WebCodec, WebRTC, ...) to quite efficiently emplace data into wasm linear memory. In theory, with this design, the one necessary copy from kernel space into user space can be used to write directly into linear memory. (Note that, anticipating this specialized use case of shared memory, while postMessage(SharedArrayBuffer) is gated by COOP/COEP, new WebAssembly.Memory({shared:true}) is not, and thus this extension could be used unconditionally on the Web platform.) In a browser-independent setting, the Interface Types stream type constructor we're iterating on should allow analogous optimizations, and bind to WHATWG streams in the JS API in terms of ReadableStreamBYOBReader.

More of an update here, my original concern with this was that not all of the use cases that this proposal is intending to target use streams, I'm currently still working on the subset of workloads that this proposal should handle well. That is still WIP, and will report back here when I have more to share.

Why would there need to be any additional bounds checking? If a mapped region is overlaid on the linear memory, the wasm code could just use regular memory ops with the standard linear bounds checks.

@mykmartin - Several Wasm engines have optimization strategies for getting rid of the standard linear bounds checks, using guard pages for example removes the need for the linear bounds checks under the assumption that the memory is owned by Wasm.

mykmartin commented 2 years ago

Several Wasm engines have optimization strategies for getting rid of the standard linear bounds checks, using guard pages for example removes the need for the linear bounds checks under the assumption that the memory is owned by Wasm.

Ok, but how does a given region of the linear buffer being mapped onto affect that? From the wasm code's point of view, it's still just a regular lookup in the standard address space.

dtig commented 2 years ago

Several Wasm engines have optimization strategies for getting rid of the standard linear bounds checks, using guard pages for example removes the need for the linear bounds checks under the assumption that the memory is owned by Wasm.

Ok, but how does a given region of the linear buffer being mapped onto affect that? From the wasm code's point of view, it's still just a regular lookup in the standard address space.

Sorry, I'm not sure how I missed this last question. To me this is different in a couple of different ways:

The introduction of potentially unmapped pages in the middle of linear memory which we currently always assume to be accessible means that we would need some special case trap handling code that handles out of bounds accesses for unmapped memory regions. This is probably doable, but being able to write-protect these pages on mapping also means that adjacent pages have different protections. It's probably doable for single threaded code to optimize them without significant overhead.
The bigger question for me, is that what happens in multi threaded code. In the browser environment, with multiple workers, I'm not sure what guarantees we would be able to provide. Do the existing relaxed semantics apply to memory when memory is being marked read-only apply? My guess is in some cases this would be unsafe, and would need to propagate across workers to enforce that the memory is read-only in a safe manner. Explicit bounds checks that look up protections is one way to provide some guarantees.

lukewagner commented 2 years ago

@dtig Awesome to hear about your stream WIP and I'm interested to hear more.

Could you elaborate on how multiple mappings would work? I'm also thinking about what would happen when after unmapping one external buffer, but a different buffer now needs to be mapped in. One of the concerns I had was depending on the sizes of the buffers that we need, if unmapping makes regions of the existing memory inaccessible, then subsequent buffer.unmap operations leaves larger and larger chunks of the linear memory inaccessible. Or does this approach sidestep that problem by using madvise(MADV_DONTNEED) because the memory is not then inaccessible by default?

Yup! You're correct in your final sentence: since ultimately buffer.map and buffer.unmap have copy semantics, they always leave all linear memory accessible and, after a buffer.unmap, zeroed. The real goal of the madvise(DONTNEED), though, is to efficiently remove any dependency from the virtual memory on the previously-mapped file descriptor so it can be released or mutated.

SamuraiCrow commented 2 years ago

Subproposal: Device Drivers with an Architecture-neutral Software Sandbox

Has anyone considered marking a region of memory as volitile so statically compiled WebAssembly modules could implement memory-mapped I/O for device drivers?

Motivation for Adding This

Architecture neutrality is hard to come by in device drivers due to closed-source binary blobs. This also makes newer and more efficient operating systems difficult to adopt. TheseusOS and Haiku come to mind as having a hard time in this way.
Sandboxing in microkernel operating systems offers security at the expense of performance. Genode comes to mind as a secure microkernel for Linux users because a modified Linux kernel can run hosted on it. WebAssembly's sandboxing at compile time can remove the performance compromize of using user-mode drivers while improving trustworthiness of drivers on monolithic kernels.

Future Possibilities

Driver frameworks that can catch stray DMA transfers.
IOMMU controls for graphics cards to run with more self-sufficiency.

ratchetfreak commented 2 years ago

Ok, but how does a given region of the linear buffer being mapped onto affect that? From the wasm code's point of view, it's still just a regular lookup in the standard address space.

Sure from the WASM code, it's just a memory access.

But that's not where the bounds check will be, the WASM implementation now needs to (in the worst case) check each memory access to see in which mapping it happens and create the correct pointer offset from the WASM memory offset and handle when an unaligned memory access straddles a boundary.

Also from the implementation side, very few memory mapping apis (I'm thinking of opengl's glMapBuffer and vulkan's vkMapMemory) let the user code (read: the wasm implementation) pick where the mapping happens, this means that when a map is requested by WASM code the implementation cannot simply tell the OS kernel to map that into the memory of the wasm module because the API doesn't let it.

Moreover those mapping boundaries are dynamic. So you cannot on module load inspect the module and find all the boundaries to create a perfect hash.

All this culminates in a pretty significant pessimization for a the most hot part of a WASM implementation, the memory access.

wingo commented 2 years ago

A drive-by comment: memory.discard and a declarative read-only low region of memory makes a lot of sense to me.

However when it comes to getting C/C++ programs to emit reads and writes from non-default memory, this is going to be as invasive both to programs and to the toolchain as natively accessing packed GC arrays. So perhaps we should focus effort there. GC will arrive soonish (right? lies we tell ourselves, anyway) and so maybe this feature isn't needed, as such.

It sure would be nice to solve this use case without exposing the mmap capability to users.

penzn commented 2 years ago

However when it comes to getting C/C++ programs to emit reads and writes from non-default memory, this is going to be as invasive both to programs and to the toolchain as natively accessing packed GC arrays. So perhaps we should focus effort there. GC will arrive soonish (right? lies we tell ourselves, anyway) and so maybe this feature isn't needed, as such.

Do we have a separate thread somewhere about accessing GC packed arrays from C++? I think it has potential, though feasibility of the toolchain change is probably the main question.

wingo commented 2 years ago

@penzn Not sure if there is a thread, and though it's important for tying together all parts of the system, it's probably out of scope for wasm standardization. Anyway I just wrote up some thoughts here, for my understanding of the current state of things: https://wingolog.org/archives/2022/08/23/accessing-webassembly-reference-typed-arrays-from-c

smilingthax commented 2 years ago

Subproposal: Virtual Address Area

Problem space:

Linear memory is compatible with usual pointers, etc. available in programming languages that are compiled to WASM, while multiple memory as proposed is not (at least not easily).
Modern hardware + operating systems are much more flexible, because they can use virtual memory supported by hardware: MMUs, page tables, etc. to implement (e.g.) mmap.
Future support for mmap, etc. should not have an impact on code that does not use such features (e.g. already existing wasm code).

Solution idea:

Use "negative addresses", e.g. 0xff123456 to refer to the special "virtual address area" when using the usual WASM load/store instructions.
As different use cases will require different amounts of virtual address space vs. normal linear memory (and only 32 bits must cover both – unless memory64 is used...), a wasm module shall choose at compile time the size of the virtual address (-> VMA) area it wants reserve. Regular memory can then only grow up to 2**32 - vma_size. If no virtual address area is requested, the wasm compiler can completely optimise out any VMA checks, as is the case with 'legacy' code. Runtimes might want to limit the maximum size that can be reserved as VMA (e.g. MSB must alway be set, etc.) .
For (e.g.) JS, WebAssembly.instantiate will return a VMA-handle w/ methods in addition to the well known Memory-handle/object.
This subproposal does not concern itself with how exactly "real memory" could be mapped at runtime into some (sub-)region of the VMA, but only describes a foundation of how to integrate operations like mmap into the existing wasm infrastructure. Ideally those future operations would use virtual memory mappings facilities of the operating system / hardware (which might limit the possible start/end-addresses of such regions to page-aligned addresses...). The user is basically free in how and where he places regions inside the VMA and what they should contain.

Example use cases of what could possibly be done inside the VMA with appropriate future mapping operations:

The user chooses to map the first 0x1000 bytes of a given JS Blob or (Shared-?)ArrayBuffer(or an aligned subarray of it?) into the region from address 0xfffe000 to 0xfffefff inside the VMA. The user later choose to remap the region to show 0x1000 bytes starting from the 0x1000-th byte of the Blob/....
The user wants have two views at different addresses inside the VMA, which are backed by the same "physical" memory, but with different permissions (e.g. R vs. R/W).
The user wants to implement Copy-on-write by setting up a region as read-only and receiving a "signal"/"trap"/"userfaultfd notification" when a write is attempted. It then changes the region permissions to R/W via some api to let the formerly-trapping-store-instruction continue.

SamuraiCrow commented 2 years ago

Some third-party applications already use negative addresses to flag memory areas as needing reversed byte order such as big endian processors. If two different uses of negative addresses that will conflict. Also, the WebAssembly standard is officially little endian so it is unlikely that endian swapping will get official support any other way. This is according to w2c2 documentation and I think the wasm2native compiler (the third-party big endian supporters).

smilingthax commented 2 years ago

Some third-party applications already use negative addresses to flag memory areas as needing reversed byte order such as big endian processors. If two different uses of negative addresses that will conflict. Also, the WebAssembly standard is officially little endian so it is unlikely that endian swapping will get official support any other way. This is according to w2c2 documentation and I think the wasm2native compiler (the third-party big endian supporters).

Their reversed memory addressing (mem[size - ptr] instead of mem[ptr]) in the wasm compiler does not affect what shall happen when ptr < 0, resp. ptr > 0xff123456(e.g.) is accessed from a wasm instruction (load/store).

SamuraiCrow commented 2 years ago

Oh ok. Thanks for clarifying.

BlobTheKat commented 8 months ago

bump. this would be a massive leap in the ability to port existing libraries and applications to wasm, as well as generally increase memory efficiency for large wasm programs that make good use of the heap

mhatzl commented 6 months ago

Trying to follow the discussions it seems the focus is on option 1 and a static number of memories. Why is it bad to allow dynamic creation/deletion of linear memories?

With the multi-memory proposal having the option to create and delete memories via instructions at runtime could in my opinion solve many problems mentioned in https://github.com/WebAssembly/design/issues/1397

Possible new instructions

memory.create memargs ... creates a new linear memory with the given memargs and returns the index to address this new memory. Traps/Errors if it is not possible to create this memory (kind of like memory.grow)
memory.delete memidx ... "deletes" the memory addressed with the given index. Fails if index=0 to prevent from deleting the default memory. Noop if the index does not point to a created memory.

Embeddings may choose to keep the allocated memory for later memory.create calls.
memory.mem_copy idx1 address1 len1 idx2 address2 len2 ... copies memory from one linear memory to another. Fails if either index is not valid or addresses+len are not in memory range.

Not happy about the instruction name, but I could not come up with a better one for now.
memory.map_extern externref memargs ... maps an extern memory using the given memargs returning the memidx to access the memory

This could be used for example to exchange/share large blobs like media files between host and wasm. However, I am not sure how to ensure that the externref actually points to a WasmMemory with the given memargs.

Why these instructions help

As mentioned in issue #1397, applications often allocate a bunch of memory that is intended to be deleted again after a short period. If memories could be created and deleted during runtime, this would prevent fragmentation of other longer-lived memories.

This would be especially helpful for shared memories, because these must specify a maximum at creation. Knowing a maximum upfront is really difficult for most applications, but because of shared access, I get why a maximum must be set.

Side note on read-only memory:

With multiple-memories, one of the static memories could be marked as read-only. This memory only takes values from the data-segment, but does not allow store instructions. I am not sure on how useful it is to allow the creation of read-only memories, because this memory won't grow anyways.

Also changing between read-only and writeable at runtime seems unnecessary, because this must be restricted by the language compiling to wasm. Otherwise, one could always change the mode as needed, making it an inconvenience, but not a security feature.

Possible mini-fragmentation

Because memories at least have a size of one page, It is inefficient to create one memory per dynamic object e.g. vector. Which results in manual memory management per linear memory. One would probably need some fixed address in a default linear memory that points to the tree of free memory blocks in each dynamically created linear memory. The good thing here is that the tree itself may be located in a dynamically created linear memory, because it grew over time, exceeding the maximum of the default linear memory.

How to handle memidx

create could always bump the index by 1 and because memidx is i32, this should even be sufficient for applications not running multiple years. Alternatively, create returns the index of the last deleted memory, or increases the index by 1 if no memory was deleted yet.

Problems and open design questions?

There are definitely problems that explain why it was decided against dynamic memory creation and deletion. Happy to hear your feedback.

somethingelseentirely commented 2 months ago

I think @lukewagner is on to something there, but I feel like it has never been fully articulated in this conversation.

What if an instances linear memory being (memory 0) is a mistake, that forces the entire spec down a garden path towards load/store addressable additional memories .

There already is an existing solution on how to integrate multiple memories with different read/write capabilities in a flexible manner: mmap

mmaping even allows for the construction of things like virtual ring-buffers, where writers can write past the end of a doubly mmaped file to simplify the wrapping logic, and is therefore a strict superset of what the current multi-memory proposal is capable of.

Such a solution might look like:

A single anonymous linear memory that is not accessible from outside the WASM instance.
Current programming language compatible load/store instructions that only operate over the anonymous linear memory.
A memory index similar to what's in the multi-memory proposal, with different read/write capabilities and ArrayBuffer/Blob/whatever sources.
EXPLICIT mmap operations that map page ranges from the memories in the memory index onto page ranges in the anonymous linear memory.

This would give us the best of both worlds. Decoupling of multiple memories on both the host and the wasm side (the wasm instance can not only ignore additional memories offered by the host, but also has a lot of control of when and where things get moved around, e.g. when one of the memories changes in size and the wasm instance decides to either ignore that scenario, potentially re-map other memories, move other allocations around, or potentially even create non-contiguous mappings)

This would also align well with existing mmap semantics which would help WASI match existing applications' requirements, with the memory index essentially being file descriptors of host provided mmapable files.

Semantically the mmap would just be a copy of the memory source region into the memory target region with an unmap equivalent to zeroing the range as proposed by @lukewagner.

BlobTheKat commented 2 months ago

I think @lukewagner is on to something there, but I feel like it has never been fully articulated in this conversation.

What if an instances linear memory being (memory 0) is a mistake, that forces the entire spec down a garden path towards load/store addressable additional memories .

There already is an existing solution on how to integrate multiple memories with different read/write capabilities in a flexible manner: mmap

mmaping even allows for the construction of things like virtual ring-buffers, where writers can write past the end of a doubly mmaped file to simplify the wrapping logic, and is therefore a strict superset of what the current multi-memory proposal is capable of.

Such a solution might look like:

A single anonymous linear memory that is not accessible from outside the WASM instance.

Current programming language compatible load/store instructions that only operate over the anonymous linear memory.

A memory index similar to what's in the multi-memory proposal, with different read/write capabilities and ArrayBuffer/Blob/whatever sources.

EXPLICIT mmap operations that map page ranges from the memories in the memory index onto page ranges in the anonymous linear memory.

This would give us the best of both worlds. Decoupling of multiple memories on both the host and the wasm side (the wasm instance can not only ignore additional memories offered by the host, but also has a lot of control of when and where things get moved around, e.g. when one of the memories changes in size and the wasm instance decides to either ignore that scenario, potentially re-map other memories, move other allocations around, or potentially even create non-contiguous mappings)

This would also align well with existing mmap semantics which would help WASI match existing applications' requirements, with the memory index essentially being file descriptors of host provided mmapable files.

Semantically the mmap would just be a copy of the memory source region into the memory target region with an unmap equivalent to zeroing the range as proposed by @lukewagner.

I counter this proposal, wasm does not need fine-grained permission control, think about it:

Wasm has no concept of executable memory
Nonwritable memory was added primarily to protect data sections: dynamic allocations rarely involve write permissions since it would require allocating a completely separate page for every new allocation, which is a huge waste of memory
Nonreadable memory makes no practical sense
Most additional posix mmap flags like shared memory or downwards-growing memory don't make sense in terms of wasm

I think an mmap-like function would be great for wasm but only in the sense of making memory non-linear. The ability to mmap files is severely limited by the address space (32 bits)

programmerjake commented 2 months ago

Nonreadable memory makes no practical sense

I think it does, but only in the sense of making a hole in the address space that holds no data and traps when read or written, e.g. to catch null pointer dereferences in C-style languages.

I think an mmap-like function would be great for wasm but only in the sense of making memory non-linear. The ability to mmap files is severely limited by the address space (32 bits)

memory64 to the rescue!

somethingelseentirely commented 2 months ago

wasm does not need fine-grained permission control

My proposal is not about naively stuffing native mmap into WASM, but about aligning the multiple memory proposal (which I think is a good feature and a potential solution for many real world issues) with the reality of our programming languages not being able to deal with a completely alien memory model where individual load and store instructions have fine grained memory contexts.

In a sense I am arguing against a very fine grained permission model too.

I think that read-only memory is important for security in addition to consistency when working with mmaped file IO, where you want to get read-only access to a zero-copy blob from wasm, or you want to be able to write to a network buffer from wasm which you then mark read-only/unmap to pass ownership of the memory to the host/network stack.

But I think it is even more important to decouple the semantics of multiple memories from the semantics of individual load/store instructions, and a mechanism that allows us to do so, and that has been tried and tested is mmap (in this context I don't mean the specific implementation, but the concept of using the MMU to map certain memory/file/buffer ranges onto other virtual memory ranges).

The host context provides buffers (memories), the wasm context is given explicit control if and how to mmap those buffers into its linear memory.

It would give existing languages explicit control over how they want to deal with multiple memories (including the option to ignore it, with the host potentially performing a single mapping of (memory 0) onto the anonymous linear memory to recover the current semantics), and would enable any language that has the ability to do mmaped IO to immediately start using the multi memory feature.

rossberg commented 1 month ago

Just a clarifying remark: multiple memories always existed in Wasm, since version 1.0: by linking two modules together that both define their own memory you always had multiple unrelated address spaces.

The only limitation that is finally lifted by the multi-memory extension is the (weird) restriction that a single module was not able to speak about multiple memories. That caused various problems, for example, the inability to statically link (merge) arbitrary modules, or the inability to efficiently move data between such memories. There are other use cases for multiple memories, too, that don't require exposing them to a source language, for example, instrumentation or poly-filling other features.

lukewagner commented 1 month ago

FWIW, I'm in favor of adding some degree of support for read-only memory and no-access memory, if only to allow us to more-simply claim that wasm is an entirely more secure way to run code.

We already get a huge mitigative security boost from our protected stack and CFI but the fact that *(void*)NULL doesn't trap and that we can't prevent mutation of .rodata unnecessarily hurts the simplicity of our claim and forces more nuanced arguments of pros-vs-cons (or "fine-grained sandboxing is what you actually want"). So I'm definitely in favor of closing these gaps; the only problem is that, especially now that wasm is showing up in all sorts of diverse contexts (in production, in high volume), we can't always assume we have an MMU or that we're given access to it. MPUs (Memory Protection Units) are, it sounds like, becoming a reasonable assumption to make even in embedded hardware, but they only support coarse-grained protection (a small constant number of regions with different protection).

As mentioned above and brainstormed further by @eqrion more recently, if we had a very coarse-grained protection model (M no-access pages starting at 0 followed by N read-only pages followed by all read-write pages up to memory.length, with M and N declared in the memtype), we could implement the semantics with an MPU or without any hardware assistance at low overhead. So I like that idea.

The only mitigative use case I'm aware of that this doesn't solve is linear-memory-stack guard pages, which would seem to still need fine-grained protection. But maybe stack-canaries implemented for wasm by LLVM are enough?

dtig commented 1 month ago

Adding a note here that the work on this proposal has now moved to the memory control proposal repository, which reflects the current work. Feedback/issues on the proposal repository are appreciated so we can discuss them in more detail. Looking at the proposal repository, you may notice that there are several possible directions, though given how diverse the ecosystem is and the current restrictions of production VMs, we don't yet have consensus on exactly how we'll be tackling this.

As mentioned above and brainstormed further by @eqrion more recently, if we had a very coarse-grained protection model (M no-access pages starting at 0 followed by N read-only pages followed by all read-write pages up to memory.length, with M and N declared in the memtype), we could implement the semantics with an MPU or without any hardware assistance at low overhead. So I like that idea.

I assume this is the sketch in static-protection.md? I like this idea too, my concern is that if this was to be fully static it would be hard for runtimes to motivate a fundamental memory layout change without some runtime control of the read-only section.

somethingelseentirely commented 1 month ago

Thanks for the links to the more recent proposal repository, it looks like there are already some similar Ideas articulated there!

Regarding the static protection proposal for MPUs, it feels like feature creep for WebAssembly to also try to also become the universal IR for embedded systems. Introducing the difficulties of embedded programming to the web ecosystem, seems like an unnecessarily masochistic restriction, when even smaller cores slowly move into the direction of having virtual memory.

Giving up fine-grained r/w-control and memory mapping APIs in return, which have real security, reliability and performance applications seems like a bad deal for everyone except embedded developers. And I say that with a somewhat large distain for the complexity of MMUs and the memory stack of modern OSes.

It is OK to have different technologies that solve their use case well, and if WASM wants to make a dent into the native application space it needs to have equal or better capabilities and guarantees, and the lowest common denominator with embedded hardware won't fit that bill. Embedded folks will also probably be happier if they get their own specific thing/spec and don't have to foot the bill for high-level stuff like GC.

Edit: Embedded systems could also simply fail and abort when they get a memory mapping request that's not compliant with their MPU layout:

      const noaccess = new WebAssembly.Memory({ initial: 1, mode: "n"});
      const readonly = new WebAssembly.Memory({ initial: 10, mode: "r"});
      const readwrite = new WebAssembly.Memory({ initial: 100, mode: "rw"});
      ...
      js: { nomem: noaccess, rmem: readonly, mem: readwrite}

(module
  (mmap (import "js" "nomem") 0 1)
  (mmap (import "js" "rmem")  1 10)
  (mmap (import "js" "mem")   11 100)
  (mmap (import "js" "rmem")  111 5) // this would panic on embedded
  ...

titzer commented 1 month ago

@somethingelseentirely

Giving up fine-grained r/w-control and memory mapping APIs in return, which have real security, reliability and performance applications seems like a bad deal for everyone except embedded developers.

I have some sympathy for this in that there are a lot of powerful features that can offer real value to applications and are already in wide use. There is a lot more diversity in system APIs and capabilities, comparatively speaking, than hardware ISAs. Constantly falling short of feature parity compared to native platforms or APIs limits Wasm's ability to add value to ecosystems. Limiting everything to the least common denominator will eventually cause the least capable platform to dictate that more capable platforms can't exist. So we'll need to manage ecosystem diversity in some way.

That said though, WebAssembly has threaded this needle by deftly picking MVP features that get the main value-add of a feature without unduly burdening implementations. What @lukewagner mentions, refering to work by @eqrion to make a simplified model that gets effectively MPROTECT_NONE and MPROTECT_READONLY could get enough of that feature in a forward-compatible way so that the Wasm security story is at least at (some simplified level of) parity with native.

BlobTheKat commented 1 month ago

Very little of this thread has been dedicated to possible malloc implementations, after all I assume part of the motivation behind better memory control is to reduce memory fragmentation.

How well would having multiple memories solve this problem compared to, say, an mmap-based approach?