Discussion of alternative: byte-granularity deterministic cap

WebAssembly / custom-page-sizes

Other

1 stars 1 forks source link

Discussion of alternative: byte-granularity deterministic cap #13

Open keithw opened 1 month ago

keithw commented 1 month ago

First off, apologies that I didn't see PR #12 until it was mentioned in today's meeting. From the April 23rd discussion, I had been expecting an issue thread on this and just didn't see it -- I'm sorry to keep applying stop energy to this.

My understanding is that this proposal is aimed at letting Wasm run (in a spec-conforming way) in environments with less than 64 KiB of memory (or, more rarely, less than some other integral number of 64-KiB pages). For this use case, my own opinion is that a deterministic "cap" on loads/stores would be simpler and less invasive, end-to-end, vs. plumbing custom page sizes through the tools and consumers. The "cap" would be a static declaration on a memory that requires the consumer to trap on loads and stores that go over some index. Syntactically, it could be part of limits or memtype, or it could be a new orthogonal element in a new section.

To me this seems like it would be a lot less invasive for consumers to implement than the current proposal. Unlike the current proposal:

It wouldn't (necessarily) change the memory type matching rules.
It wouldn't change the semantics of memory.size or memory.grow.
It wouldn't change the tool conventions, meaning:
- no addition of __builtin_wasm_page_size
- no new relocation type
- no new linker behavior (the barrier to entry for writing a new WebAssembly linker seems high already -- I think there is only one implementation of this?)
It wouldn't add more magic to ".o" files. (Right now it's pretty nice that you can compile something with clang -c and the output in WAT, decoded by standard tools, is comprehensible as a Wasm module. The discussion in https://github.com/WebAssembly/custom-page-sizes/issues/3 suggests that newly generated .o files will implicate this proposal pervasively, and knowledge of the tool conventions and non-standard sections will become more necessary to understand what the .o file is trying to do.)

In exchange for giving up all this, the alternative would

change the execution rules for memory loads and stores, requiring a trap if they are over the cap

From this point of view, the latter is a much shorter list. :-) I think this alternative would satisfy the "small-memory" use cases that I'm aware of, and I hope it's clear why it feels less invasive and easier for consumers and tools to deal with.

What I expect is that consumers in these small-memory embedded environments would refuse to instantiate a module unless it declares a sufficiently constraining set of "caps" on its memories and memory imports. Other consumers would probably just ignore the cap until an actual load/store is executed against a capped memory.

Downsides:

Custom page sizes is a lot more general and can handle a broader range of use cases. E.g. the "cap" alternative wouldn't help with the other use case given in the overview ("if an audio effects library operates upon blocks of 512 samples at a time, with 16-bit samples, it can use a 1 KiB page size to avoid fragmentation and over-allocation"). Custom page sizes basically subsumes the "cap" alternative.

Response: The view could be that the over-allocation is tolerable everywhere except small-memory environments, which are well-handled by the "cap." Or maybe that the audio effects library should be using 512-byte GC char arrays instead of 512-byte custom pages in a linear memory. But if we want to tackle these other scenarios with full generality, then yeah, custom page sizes are probably the way to go.

Two other downsides are given in the #12 overview:

"It fails to reuse existing Wasm concepts, like memory being composed of pages, increasing the effort required to spec the feature and imposing additional complexity burden on implementations."

Response: Memory would still be composed of pages, and they'd still be 64 KiB. The effort required to spec the feature, and the additional burden on implementations, seems vastly smaller given my two bulleted lists above. It would, however, add a new execution-time concept (the byte-granularity "cap") that doesn't exist today.

"Simultaneously, it provides less flexibility and generality than this proposal: Wasm cannot, for example, rely on memory.size to determine the "real" maximum addressable memory address."

Response: This is true. However, even today Wasm has no mechanism at execution-time to determine the min or max of a limits. Like the "cap", these numbers are static and don't change at runtime. If this is really desired, it seems doable to specify a new memory.limits const instruction that takes an index immediate, and probably a field immediate (min/max/cap/shared/etc.), and pushes the corresponding static value.

=====

Bottom line: for the particular goal of supporting small-memory environments, the "cap" feels a lot less invasive and challenging to implement than going to custom page sizes. I'm nervous about putting even more weight on the tools and tool-conventions, especially when lld seems to be the only implementation of these, it's not part of the Wasm spec, and the comprehensibility/accessibility of ".o" files is a nice thing to have. However, if there is a desire for the full flexibility and generality of custom page sizes for their own purposes (independent of the particular use-case of small-memory environments), then that's clearly the way to go.

alexcrichton commented 1 month ago

Personally I feel that in trying to think through this proposal further it ends up being on the same level of complexity as custom page sizes. In that sense I don't feel that your bulleted lists capture the full breadth of the impact of having a "cap" on memory instead.

The first example which jumped to mind was https://github.com/WebAssembly/wasi-libc/pull/500. That PR is adding an emulation of the POSIX mprotect syscall. That works by ensuring that the desired protections are read/write and that the memory is in-bounds. The implementation relies on being able to acquire the end of memory which is done through memory.size multiplied by the page size. This logic would not be possible with a "cap" because there is no way at runtime to acquire the maximum size of memory. Fixing this would require an LLD relocation basically the exact same as the relocation for a custom page size meaning that if that PR were to work with the "cap" proposal here then it would similarly need LLD changes. To me this specific issue is a proxy for a more general concern of that this proposal would add extra complexity in terms of now there are two different concepts for the maximum size of memory, one is page-aligned and the other might not be. This seems ripe for misinterpretation in many applications/libraries to the extent of possibly creating even serious issues because out-of-bounds addresses might be accidentally concluded to be in-bounds (for example if an application relies on mprotect to say all memory is read/write it wouldn't be correct with "cap" since the deduction of the return value of mprotect was based on a page-aligned value). Put another way, I would say that the "cap" proposal changes the semantics of memory.size since modules are no longer guaranteed to be able to access memory up to the size indicated.

You also say that memory type matching rules wouldn't need to change, but I would personally find that confusing. To me the type-checking of linear memories would get even more complicated than today because each memory type can list an optional maximum page size in addition to an optional maximum byte size. Is it an error if the byte size is specified and the page size isn't? What if the maximum page size is excessively larger than the maximum byte size? Additionally maximum page sizes factor into whether two memory types are considered "matching" so I'm not sure why the maximum byte size wouldn't factor into this calculation. Instead what I'd expect is that the maximum byte size of a linear memory is calculated as an input of the maximum byte size of memory an the listed maximum page size, being the minimum of those two.

Right now it's pretty nice that you can compile something with clang -c and the output in WAT, decoded by standard tools, is comprehensible as a Wasm module

I don't think that #3 would really affect this. The only change would be that if you had a function working with a page size it would look like i32.const 0 instead of i32.const 65536 (or, more likely, folded into some other constant). The relocation/linking custom sections are already incomprehensible in the text format (as there is no text format so the next-best-thing is the @custom printing which just dumps the raw binary contents) so adding another relocation there shouldn't really affect the readability.

It would, however, add a new execution-time concept (the byte-granularity "cap") that doesn't exist today.

One thing I want to be sure to call out here is that this proposal still has a required trapping semantics of memory accesses beyond the cap. This means that engines, just like with custom page sizes, can no longer rely on guard pages. My comment here isn't addressing the OP directly but rather additional concerns that the CG has raised on various occasions. The complexity of an engine not being able to rely on guard pages seems inevitable with any proposal to limit the size of memory to less than 64k. I mostly just want to be sure to call out that this proposal is not solving this concern or enabling engines to always use guards, engines will still need to implement a bounds-check-every-memory-access mode.

tlively commented 1 month ago

I agree with @alexcrichton that having both a max page size and a max byte size seems complicated and error prone, but I'm also sympathetic to the argument that the units of memory.grow and memory.size changing is complicated and error prone (although thankfully this is mitigated somewhat by the decision to have only two possible page sizes).

What if we went with custom page sizes to avoid having multiple kinds of max size, but then also added memory.size_bytes and memory.grow_bytes instructions? The latter would round the number of requested bytes up to the next multiple of the page size, so their behavior wouldn't be entirely independent of the configured page size, but it would be more independent than the current instructions.

That being said, I think there's not much user-facing difference between adding memory.*_bytes instructions and adding a new relocation type. Both suffice to generalize over the page size at compile time, and I agree with @alexcrichton that the new relocation wouldn't be materially different from existing relocations. Given that, I would err on the side of adding less to the core spec and choose the relocation over the new instructions.

keithw commented 1 month ago

@alexcrichton, I certainly respect your view. And I appreciate that wasmtime has already implemented the current version of this proposal and probably isn't eager to change. In terms of comparative implementation burden, responses below.

The first example which jumped to mind was WebAssembly/wasi-libc#500. That PR is adding an emulation of the POSIX mprotect syscall. That works by ensuring that the desired protections are read/write and that the memory is in-bounds. The implementation relies on being able to acquire the end of memory which is done through memory.size multiplied by the page size. This logic would not be possible with a "cap" because there is no way at runtime to acquire the maximum size of memory.

One way to handle this would be a memory.limits const instruction (as outlined above) -- I don't think it necessarily needs more.

To me this specific issue is a proxy for a more general concern of that this proposal would add extra complexity in terms of now there are two different concepts for the maximum size of memory, one is page-aligned and the other might not be.

Yes, agreed -- this alternative would add a static byte-granularity limit on the exposed length of the memory data, in addition to the page-granularity limit. It seemed like the minimal thing we could do to satisfy the particular use-case of small embedded environments.

You also say that memory type matching rules wouldn't need to change, but I would personally find that confusing. To me the type-checking of linear memories would get even more complicated than today because each memory type can list an optional maximum page size in addition to an optional maximum byte size. Is it an error if the byte size is specified and the page size isn't?

No.

What if the maximum page size is excessively larger than the maximum byte size?

Also no error.

Additionally maximum page sizes factor into whether two memory types are considered "matching" so I'm not sure why the maximum byte size wouldn't factor into this calculation. Instead what I'd expect is that the maximum byte size of a linear memory is calculated as an input of the maximum byte size of memory an the listed maximum page size, being the minimum of those two.

I think it would be workable either way -- it depends if the "cap" would be part of the memtype or a freestanding declaration, which I tried to leave open. If you think it's better to change the matching rules (making it a failure to use a memory with a smaller cap to satisfy an import with a higher cap), fine with me. On the other hand, maybe it would be nice if only the embedded engines have to care about the "cap" at instantiation time (and for everybody else, it's just an issue of execution semantics).

Right now it's pretty nice that you can compile something with clang -c and the output in WAT, decoded by standard tools, is comprehensible as a Wasm module

I don't think that #3 would really affect this. The only change would be that if you had a function working with a page size it would look like i32.const 0 instead of i32.const 65536 (or, more likely, folded into some other constant). The relocation/linking custom sections are already incomprehensible in the text format (as there is no text format so the next-best-thing is the @custom printing which just dumps the raw binary contents) so adding another relocation there shouldn't really affect the readability.

I'm having a hard time thinking of a common situation today where the body of a function in a .o file (parsed as standard Wasm) is expected to be incorrect until relocations are applied from the custom section. My impression was that relocations are required for speedy linking but, if you're willing to parse the code section, not generally required to understand the execution semantics. Maybe I'm not thinking broad enough or unaware of existing dependence on the tool conventions.

It would, however, add a new execution-time concept (the byte-granularity "cap") that doesn't exist today.

One thing I want to be sure to call out here is that this proposal still has a required trapping semantics of memory accesses beyond the cap. This means that engines, just like with custom page sizes, can no longer rely on guard pages.

Yes, agreed. I don't know how we'd get rid of that if you also want to support a module with a memory of length 1 KiB (or length 17 bytes).

conrad-watt commented 1 month ago

Just to add a brief "Wasm language-level" perspective, I can echo @tlively in that my bias is to look for solutions that add fewer new concepts to the core spec, which has to be eternally forwards-compatible and therefore grows monotonically in size and complexity. I also like that the custom page sizes proposal sets us up neatly for future memory mapping/protection features that might have to operate per-page - e.g. I could imagine a future memory mapping feature that requires a certain minimum page size, but that would be unwieldy to use with 64k pages.

ppenzin commented 1 month ago

Since we expect that custom size is going to be used in specific resource-constrained environments, it is necessary to understand if assumptions we make about this approach hold true in such environments. As an example

The first example which jumped to mind was WebAssembly/wasi-libc#500. That PR is adding an emulation of the POSIX mprotect syscall. That works by ensuring that the desired protections are read/write and that the memory is in-bounds. The implementation relies on being able to acquire the end of memory which is done through memory.size multiplied by the page size.

How would aforementioned implementation that currently caps memory handle this? I assume it is a baremetal device of some sort, is mprotect even supposed to work?

I'd like to second @ajklein's point one more time that the consumers of a feature aimed at supporting less than 64 KB memories are embedded users and it would be good to make sure the direction of this proposal is aligned with stated use cases. For a forum that is probably better equipped to review this, Embedded SIG has been approved in Bytecode Alliance, though personally maybe even a CG subgroup is needed as this proposal shows that embedded features are not necessarily fall under WASI and Component Model.

yamt commented 1 month ago

And I appreciate that wasmtime has already implemented the current version of this proposal and probably isn't eager to change.

where can i find the implementation?

alexcrichton commented 1 month ago

@keithw oh to clarify Wasmtime does not yet have an implementation of this proposal, only the wasm-tools repository which is basically everything that doesn't include execution of the wasm. The implementation there is additionally small enough I wouldn't consider its inertia as a reason to not change this proposal. Rather I'd prefer that consenus over how to handle this proposal centers around the technical merits of the proposal, not inertia of what happens to be implemented today.

I also think that your answers to the questions I raised are reasonable, but I mostly wanted to point out that the simplification of "just add a byte cap to memory" hides intrinsic complexity in even such a seemingly simple proposal. Even within the various questions there's room for debate, for example why would we want to allow both a byte cap and a page cap on memory? Why not require one xor the other? I bring these up again to mostly highlight complexities rather than saying that this should be decided here-and-now. Overall, I'm mostly addressing:

From this point of view, the latter is a much shorter list.

I realize that this is a bit tongue-in-cheek and not meant to be taken literally, but I personally feel that even very small changes in a spec like wasm have lots of complexities to sort through. If we'd need to add memory.limits instructions or memory.{grow,size}_bytes instructions that's already adding more bullets to the 1-bullet-list.

I'm having a hard time thinking of a common situation today where the body of a function in a .o file (parsed as standard Wasm) is expected to be incorrect until relocations are applied from the custom section.

An example of this is:

extern char foo[10];

char *bar() {
  return foo;
}

which when compiled as $WASI_SDK_PATH/bin/clang foo.c -c -O prints:

(module
  (type (;0;) (func (result i32)))
  (import "env" "__linear_memory" (memory (;0;) 0))
  (func (;0;) (type 0) (result i32)
    i32.const 0
  )
  (@custom "linking" (after code) "\02\08\8e\80\80\80\00\02\00\04\00\03bar\01\10\03foo")
  (@custom "reloc.CODE" (after code) "\03\01\04\04\01\00")
  (@producers
    (processed-by "clang" "18.1.2 (https://github.com/llvm/llvm-project 26a1d6601d727a96f4301d0d8647b5a42760ae0c)")
  )
  (@custom "target_features" (after code) "\02+\0fmutable-globals+\08sign-ext")
)

The return value of function 0 here isn't actually 0 at runtime, it's a constant that's filled in by the lld at link-time.

On the topic of memory.grow_bytes, how would that interact with memory.size? For example if memory is 64k + 100 bytes large, what is the return value of memory.size?

I ask this because one possible extension of this proposal to the spec is that we could consider memories as being specified in terms of byte sizes rather than page sizes at the "AST level". The current binary format would continue to define memories as a multiple of 64k sizes but in the future there could be an option to specify a min/max with a byte size as well. The complexity here to me is the above question, what to do with memory.size. For example memory.grow is relatively easy to handle as it could be interpreted as "grow memory by 64k bytes" which can be rejected at runtime if there's not enough memory.

titzer commented 1 month ago

I generally agree that custom page sizes feels more "wasmy"--it doesn't introduce a new concept, but generalizes an existing one. I think it fits in the same general category of generalizations as multiple memories. I also see it as a road to more fully utilizing extant hardware virtual memory mechanisms. Even with just two page sizes, we will now have encoding space for experimenting with 4 and 8KiB page sizes, which may make a big difference in particular use cases.

tlively commented 1 month ago

I'm having a hard time thinking of a common situation today where the body of a function in a .o file (parsed as standard Wasm) is expected to be incorrect until relocations are applied from the custom section.

An example of this is...

More generally, all relocations (unless I'm forgetting some edge case) are 5-byte LEB-encoded zeroes in object files until they are patched by the linker with 5-byte LEB encodings of the correct value, so it's not possible to discover their "real" values without actually performing linking.

On the topic of memory.grow_bytes, how would that interact with memory.size? For example if memory is 64k + 100 bytes large, what is the return value of memory.size?

I guess memory.size would round down to the nearest multiple of 64k and then divide. I can't think of anything else that would make sense.

sunfishcode commented 1 month ago

I guess memory.size would round down to the nearest multiple of 64k and then divide. I can't think of anything else that would make sense.

For memory.grow's return value, silently rounding down would break code assumes that memory.grow's return value points to previously unallocated memory.

fitzgen commented 1 month ago

@keithw thanks for filing a detailed issue, and no worries about applying stop energy, I think we all just had a genuine misunderstanding.

(I'm going to split my reply into two parts: first a general response, and second with more detailed, focused comments.)

I think this issue's write up overestimates the amount of work necessary to support the custom-page-sizes proposal and underestimates the amount of work necessary to support this alternative proposal. In particular, I think this alternative doesn't fully consider the impact of language feature composition and reusing existing language concepts versus introducing new language concepts.

Many (most?) engines and toolchains aren't only targeting embedded but aim to support Wasm across multiple domains. binaryen and wasi-sdk, for example, are used across the spectrum. While an embedded-specific engine or tool might choose to omit support for Wasm proposals that do not lend themselves to the embedded domain, an engine or tool that is portable across many domains does not have that luxury. If a portable engine or tool can implement all those Wasm features on top of the same few concepts, that requires less effort than implementing N bespoke concepts for N Wasm features.

Additionally, extending and generalizing existing language concepts avoids being forced to answer (and spec and implement) many resolvable-but-annoying questions, like those @alexcrichton is raising. When reusing language concepts, those questions don't even arise in the first place because we already have answers to them via the existing machinery around those existing language concepts. This is important not only in the current moment, but for the future as well since Wasm features are purely additive and once they are standardized and shipped, they must be supported eternally, as @conrad-watt points out.

fitzgen commented 1 month ago

@keithw

It wouldn't (necessarily) change the memory type matching rules.

What happens when module A defines and exports a memory with a max byte size of 10, module B imports a memory with a max byte size of 15, and we try to satisfy B's import with A's export? If this is not an error, then compilers cannot take advantage of statically known max byte size when emitting bounds-checking sequences, because they might actually have a memory with some smaller max byte size, and must always load the max byte size from the vm context dynamically instead of embedding it directly inside an instruction as an immediate. If it is an error, then the memory type matching rules must change to define that error case.

It wouldn't change the semantics of memory.size or memory.grow.

This exaggerates the magnitude of the change to these instructions in the custom-page-sizes proposal (replacing a language-wide constant with a value defined in the memory's static type) and downplays the costs of introducing a whole new concept to the language.

It wouldn't change the tool conventions, meaning:

no addition of __builtin_wasm_page_size

no new relocation type

no new linker behavior

Instead of linker changes, this proposal would instead require whole new additions to the core Wasm language: the memory.size_bytes and memory.grow_bytes instructions that @tlively described.

I agree with what @tlively and @conrad-watt expressed, that in general we should prefer solutions with fewer core language changes over solutions with fewer toolchain changes.

It wouldn't add more magic to ".o" files. (Right now it's pretty nice that you can compile something with clang -c and the output in WAT, decoded by standard tools, is comprehensible as a Wasm module. The discussion in #3 suggests that newly generated .o files will implicate this proposal pervasively, and knowledge of the tool conventions and non-standard sections will become more necessary to understand what the .o file is trying to do.)

.o files would be just as comprehensible as they are now, as explained by @alexcrichton.

fitzgen commented 1 month ago

@ppenzin

How would aforementioned implementation that currently caps memory handle this? I assume it is a baremetal device of some sort, is mprotect even supposed to work?

One could imagine using an MPU's capabilities for this kind of thing, when available.

For a forum that is probably better equipped to review this, Embedded SIG has been approved in Bytecode Alliance

The Bytecode Alliance is not a standardization venue. We, the BA, organize implementation work under our umbrella, and might make recommendations and proposals (such as this custom-page-sizes proposal!) to bring to standards venues like the W3C, but it is not appropriate move standards discussions out of their standardization venues and into the BA.

fitzgen commented 1 month ago

@yamt

where can i find the implementation?

The binary decoding, binary encoding, text parsing, text printing, validation, test case generation for fuzzing, and *.wast tests are all in https://github.com/bytecodealliance/wasm-tools

Support has not landed in Wasmtime itself yet.

woodsmc commented 1 month ago

I'd like to make sure our colleagues who have been shipping devices with smaller memory footprints are included in the discussion, so here comes a list of github handles / tags: @no1wudi, @dongsheng28849455, @xwang98 and @wenyongh. (I apologise if I've missed anyone).

Perhaps we could get some insight into the use cases for smaller page sizes.

From my own perspective, the device limitations restricted the code structure we deploy, reducing complexity, we are also likely to remove dynamic memory actions (grow/shrink), etc. Other perspectives on limitations usage of smaller page sizes would be useful too.

dschuff commented 1 month ago

A very minor comment:

From my own perspective, the device limitations restricted the code structure we deploy, reducing complexity, we are also likely to remove dynamic memory actions (grow/shrink), etc.

I just wanted to point out that for this specifically, it's perfectly valid for memory growth to always fail, that that doing this is much preferable to other ways of not supporting growth (e.g. rejecting memory.grow instructions at validation time).

keithw commented 1 month ago

I'm having a hard time thinking of a common situation today where the body of a function in a .o file (parsed as standard Wasm) is expected to be incorrect until relocations are applied from the custom section.

An example of this is:

extern char foo[10]; char *bar() { return foo; }

[...]

Hmm, this feels pretty different. The relocation in your example expresses the fact that the linker will (re)locate global variables in memory (it could end up at address 0). The generated .o files are still internally consistent, e.g. if I write:

static char foo_[10], bar_[10];
char *foo() __attribute((export_name("foo"))) { return foo_; }
char *bar() __attribute((export_name("bar"))) { return bar_; }

... there are relocations for the addresses of foo_ and bar_, but even without linking, the .o file is internally consistent:

$ clang -Os -c -o foo.o foo.c && wasm2wat foo.o

(module
  (type (;0;) (func (result i32)))
  (import "env" "__linear_memory" (memory (;0;) 1))
  (import "env" "__indirect_function_table" (table (;0;) 0 funcref))
  (func $foo (type 0) (result i32)
    i32.const 0)
  (func $bar (type 0) (result i32)
    i32.const 10)
  (export "foo" (func $foo))
  (export "bar" (func $bar))
  (data $foo_ (i32.const 0) "\00\00\00\00\00\00\00\00\00\00")
  (data $bar_ (i32.const 10) "\00\00\00\00\00\00\00\00\00\00"))

Whereas the current proposal uses a relocation to drop in a constant that's going to have a value of either 1 or 65,536, but unknown when producing the compilation unit so it's represented as 0 in the .o file (a syntactically valid Wasm module). The function doesn't behave correctly until it passes through wasm-ld, even if it never refers to the address of an extern symbol. That feels like a new step? It doesn't seem great, in terms of safety or comprehensibility, to be producing syntactically-valid-but-internally-incorrect Wasm modules and relying on wasm-ld (which afaik has a single implementation) to fix them up.

It doesn't look like the group is going to go for the "cap" alternative. I think a big part of my perception of complexity in this proposal comes from the added weight placed on the tool conventions. In an effort to find consensus: How about a version of the current proposal that adds a new const instruction (e.g. memory.page_size or something more general) that can be used to express a page-size-agnostic sbrk and mprotect, instead of relying on a new relocation type and lld to drop in a constant? This seems arguably a more Wasmy way to express this uncertainty, with (as a const instruction) similar opportunities for optimization. (Edit: OTOH, expressing a memory import that's agnostic to page size will require a more complicated import spec and matching rules.)

conrad-watt commented 1 month ago

Whereas the current proposal uses a relocation to drop in a constant that's going to have a value of either 1 or 65,536, but unknown when producing the compilation unit so it's represented as 0 in the .o file (a syntactically valid Wasm module).

I'm not very familiar with wasm-ld, but in the Wasm module example above it looks to me like the imported memory is already committing to a 64k page size. I would naively expect that the choice of page size is made when the module/compilation unit is first produced, and modules with incompatible page size choices simply can't be linked. Are we planning to do something more ambitious where the linker may change the type of the memory?

EDIT: I see this in the overview

By delaying the configuration of page size to link time, we ensure that we cannot get configured-page-size mismatches between objects being linked together (which would then require further design decisions surrounding how that scenario should be handled, e.g. whether it should result in a link error).

I'll defer to the people who'd actually be implementing this, but I'm a little surprised at the ambition here! I understand the argument in https://github.com/WebAssembly/custom-page-sizes/issues/3 about precompiling standard libraries.

I think a big part of my perception of complexity in this proposal comes from the added weight placed on the tool conventions.

If we go forward with this proposal, who would be responsible for the changes to wasm-ld? We should try to have this opinion firsthand.

tlively commented 1 month ago

@sbc100 has been the primary maintainer of wasm-ld. Sam, can you say how complex you think it would be to add a new relocation type for the page size?

I'll defer to the people who'd actually be implementing this, but I'm a little surprised at the ambition here!

There is precedent for wasm-ld having somewhat sophisticated compatibility rules around used and enabled features. The target features section already has a way to encode "this object file uses target feature X and linking should fail if any other object does not use target feature X," which is exactly what we need in this case AFAICT. (I designed and implemented this feature five years ago and it has never been used until now, so I'm glad I can stop regretting this particular bit of over-engineering.)

rossberg commented 1 month ago

@dschuff:

I just wanted to point out that for this specifically, it's perfectly valid for memory growth to always fail, that that doing this is much preferable to other ways of not supporting growth (e.g. rejecting memory.grow instructions at validation time).

Not sure I subscribe to that. Pretending the availability of some functionality when it doesn't actually work may be less useful than rejecting explicitly and early.

sbc100 commented 1 month ago

@sbc100 has been the primary maintainer of wasm-ld. Sam, can you say how complex you think it would be to add a new relocation type for the page size?

I'll defer to the people who'd actually be implementing this, but I'm a little surprised at the ambition here!

There is precedent for wasm-ld having somewhat sophisticated compatibility rules around used and enabled features. The target features section already has a way to encode "this object file uses target feature X and linking should fail if any other object does not use target feature X," which is exactly what we need in this case AFAICT. (I designed and implemented this feature five years ago and it has never been used until now, so I'm glad I can stop regretting this particular bit of over-engineering.)

I would hope a relocation type would not be needed here, and instead we could use a linker-synthetic symbol, similar to __tls_base and __data_end etc. For example we could call this, __wasm_page_size and its address would be the size of a page. Note that these symbols don't take up any address space they just have meaningful addresses.

tlively commented 1 month ago

That would work, but the idea was to use a relocation rather than an in-memory value to allow for easier constant propagation. OTOH, accessing this single value seems unlikely to be very performance sensitive. WDYT?

sbc100 commented 1 month ago

That would work, but the idea was to use a relocation rather than an in-memory value to allow for easier constant propagation. OTOH, accessing this single value seems unlikely to be very performance sensitive. WDYT?

It wouldn't be an in-memory value, it would linker-generated constant address. Just like most of the other linker-generated symbols: https://github.com/llvm/llvm-project/blob/88902147c11f8de5cc7c792fd8c476a821664297/lld/wasm/Symbols.h#L515-L616.

sbc100 commented 1 month ago

That would work, but the idea was to use a relocation rather than an in-memory value to allow for easier constant propagation. OTOH, accessing this single value seems unlikely to be very performance sensitive. WDYT?

It wouldn't be an in-memory value, it would linker-generated constant address. Just like most of the other linker-generated symbols: https://github.com/llvm/llvm-project/blob/88902147c11f8de5cc7c792fd8c476a821664297/lld/wasm/Symbols.h#L515-L616.

Better still we could use a wasm global for this purpose like we do for __tls_size: https://github.com/llvm/llvm-project/blob/88902147c11f8de5cc7c792fd8c476a821664297/lld/wasm/Symbols.h#L534-L536

fitzgen commented 1 month ago

So if I understand correctly, it seems like wasm-ld already supports something very similar to the get-the-custom-page-size use case, and extending that existing support to this new use case should be relatively straightforward.

Is that high-level summary correct, @sbc100?

ppenzin commented 1 month ago

The Bytecode Alliance is not a standardization venue. We, the BA, organize implementation work under our umbrella, and might make recommendations and proposals (such as this custom-page-sizes proposal!) to bring to standards venues like the W3C, but it is not appropriate move standards discussions out of their standardization venues and into the BA.

If standardization cannot discussed in that SIG (also @woodsmc take note, as you are the chair of that), then we really need to have representation of the embedded space in W3C, I don't think having the game of telephone between the implementers and the standard is acceptable. I don't know how this representation should look like, whether it is a embedded subgroup or some other form, but we really need it, because we are currently at a risk of adding something we are not sure would work for the intended use cases while being an implementation burden on everyone else.

How would aforementioned implementation that currently caps memory handle this? I assume it is a baremetal device of some sort, is mprotect even supposed to work?

One could imagine using an MPU's capabilities for this kind of thing, when available.

Apologies for maybe not being clear, this isn't about what one can imagine, rather about what is realistically supported by such implementations today. There is a bit of an inherent tension between mprotect and memories of less that 4KB, that's why it would be great not to base the standard on the former if the latter environment is not expected to support it.

sbc100 commented 1 month ago

So if I understand correctly, it seems like wasm-ld already supports something very similar to the get-the-custom-page-size use case, and extending that existing support to this new use case should be relatively straightforward.

Is that high-level summary correct, @sbc100?

Yup, correct.

tschneidereit commented 1 month ago

If standardization cannot discussed in that SIG (also @woodsmc take note, as you are the chair of that) [..]

@ppenzin, speaking as a TSC member of the BA, I want to clarify again that it's perfectly fine to discuss these kinds of things, and in Nick's words "make recommendations and proposals [..] to bring to standards venues" as part of BA-hosted activities. What we want to avoid is to create a situation in which it becomes required to participate in the BA to be able to participate in WebAssembly standardization, as moving the review of this proposal to a BA SIG would do.

For context for others: the BA is currently in the process of establishing a SIG-Embedded. While I can only speculate, it seems highly likely to me that Nick would've brought up the topic of how best to handle memory sizes that aren't multiples of 64KB there first if the SIG was already operational. After a discussion there, a proposal to the Wasm CG would then have happened, leading to the same process we have here now.

[..] then we really need to have representation of the embedded space in W3C, I don't think having the game of telephone between the implementers and the standard is acceptable.

Can you say more about what "game of telephone" you mean? It seems like after a bit of a slow start on the engagement, this proposal is now getting a lot of input from different implementers. And it seems like this proposal repository as an async forum and the regular CG meetings as a sync one work well for discussing the proposal.

Making use of these forums requires active engagement by all interested parties, but that would be the case in whichever forum—and this very issue seems like an example of that engagement.

woodsmc commented 1 month ago

@ppenzin

How would aforementioned implementation that currently caps memory handle this? I assume it is a baremetal device of some sort, is mprotect even supposed to work?

One could imagine using an MPU's capabilities for this kind of thing, when available.

For a forum that is probably better equipped to review this, Embedded SIG has been approved in Bytecode Alliance

The Bytecode Alliance is not a standardization venue. We, the BA, organize implementation work under our umbrella, and might make recommendations and proposals (such as this custom-page-sizes proposal!) to bring to standards venues like the W3C, but it is not appropriate move standards discussions out of their standardization venues and into the BA.

Totally agree, but I also acknowledge @ppenzin for calling me, and the folks in the embedded space for not contributing more actively here, cc : @no1wudi, @dongsheng28849455, @xwang98 and @wenyongh.

woodsmc commented 4 weeks ago

I can share what we are doing at the moment for smaller page sizes.

The use case, at least of us, I'm not sure for @no1wudi or others. But we're compiling a simple function, usually some user supplied transformation. It typically doesn't actually use the heap at all, and simply needs to safely encapsulate in a platform portable way the function(s) we need to invoke.

In this case a full 64kb is over kill, as we basically only need enough space for the stack ~ 4kb - $16kb are good approximations - cc @tacdom ?

What we've been doing is compiling Zig, Rust and C to wasm, then converting to .wat and reducing the page count requested manually as initially, C wants 2 pages (128kb), Rust and Zig request 16 pages (1mb), we reduce it to one page. Then convert back to .wasm

Then as discussed with @keithw we change the runtime's definition of what a page size is, as we embedded it in the host.

Typically the code we're compiling is pretty limited.

We're never going to do memory.grow / shrink, etc. In general memory is considered more of less static.

no1wudi commented 4 weeks ago

@woodsmc For us, we have serveral different usage forms:

In the eraly stage or for small apps, we use baremetal wasm with customized system interface to avoid introduce memory.grow into the .wasm module and recude the potential memory usage from wasi-libc.
For now or more complex apps, we customized heap managment of wasi-libc to avoid call to memory.grow

Generally, we need to specify the size of the linear memory at compile time in our usage. We avoid using the memory.grow instruction because the default page size (64K) is too large for us. If the page size can be configured, for example, to 4K or 16K as needed，we can then use a more standardized approach to handle the heap (e.g. wasi-sdk).

tacdom commented 4 weeks ago

Hi all,

in general I would say the smaller the better 😉

So 4kB or less would be the general target space imho. Same goes for page count. It looks like the numbers 2 for c and 16 for other languages are more or less constants.

These numbers make sense, but there are cases where these will be overkill. For example if I use rust w/o the standard library, there is no reason to use more pages than C does.

Maybe as a first step, compiler flags would be a nice way to give a developer control over the underlying stuff. If you do not care about that you just do not touch it.

Much better would be if page sizes and page number will be optimized to the application during compilation. But this is a rather tricky task.

Dominik

From: Chris Woods @.> Date: Thursday, 6. June 2024 at 16:19 To: WebAssembly/custom-page-sizes @.> Cc: Tacke, Dominik (T CED SSI-DE) @.>, Mention @.> Subject: Re: [WebAssembly/custom-page-sizes] Discussion of alternative: byte-granularity deterministic cap (Issue #13)

I can share what we are doing at the moment for smaller page sizes.

The use case, at least of us, I'm not sure for @no1wudihttps://github.com/no1wudi or others. But we're compiling a simple function, usually some user supplied transformation. It typically doesn't actually use the heap at all, and simply needs to safely encapsulate in a platform portable way the function(s) we need to invoke.

In this case a full 64kb is over kill, as we basically only need enough space for the stack ~ 4kb - $16kb are good approximations - cc @tacdomhttps://github.com/tacdom ?

Then as discussed with @keithwhttps://github.com/keithw we change the runtime's definition of what a page size is, as we embedded it in the host.

Typically the code we're compiling is pretty limited.

We're never going to do memory.grow / shrink, etc. In general memory is considered more of less static.

— Reply to this email directly, view it on GitHubhttps://github.com/WebAssembly/custom-page-sizes/issues/13#issuecomment-2152663255, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOMPFDHQA2BARUA44WCTW73ZGBVXNAVCNFSM6AAAAABICDQTYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSGY3DGMRVGU. You are receiving this because you were mentioned.Message ID: @.***>

fitzgen commented 3 weeks ago

@woodsmc, @no1wudi, and @tacdom: thanks for the feedback and details on your use cases! It sounds like the custom-page-sizes proposal will indeed provide a standards-based solution for your use cases, allowing you to define memories smaller than 64KiB (and down to even just a single byte, if you wanted that for some reason).

What we've been doing is compiling Zig, Rust and C to wasm, then converting to .wat and reducing the page count requested manually as initially, C wants 2 pages (128kb), Rust and Zig request 16 pages (1mb), we reduce it to one page. Then convert back to .wasm

FWIW, you don't need to disassemble to .wat, edit the text format manually, and reassemble to .wasm to decrease the initial memory required for the main thread's stack. You can just pass -z stack-size=12345 to wasm-ld instead. For example, with Rust you could set the RUSTFLAGS="-C link-args=-Wl,-zstack-size=12345" environment variable during cargo build.

If the page size can be configured, for example, to 4K or 16K as needed，we can then use a more standardized approach to handle the heap (e.g. wasi-sdk).

Great! Configuring a memory's page size is exactly what the custom-page-sizes proposal introduces.

Although I should note that the only valid page sizes will conservatively be 1 byte and 64KiB initially, but with a single byte page size, you can create a memory of any 32-bit size, including for example 4KiB and 16KiB memories.

fitzgen commented 3 weeks ago

This issue's original topic of discussion (the per-memory byte-limit alternative to the custom-page-sizes proposal) has been quiet for a couple weeks now. It seems to me like the general consensus is that folks would rather continue pursuing the custom-page-sizes proposal over the per-memory byte-limit alternative.

@keithw do you (or anyone else!) have any final comments you wanted to add about the byte-limit alternative?

Unless anything new comes up in the next few days, I will schedule a time slot in an upcoming CG meeting to give another update (primarily on the discussion that's happened here and on the implementation that has landed in Wasmtime) and hold a vote to advance this proposal to phase 2. If anyone has any outstanding concerns, please file an issue for discussing them before then, thanks!

keithw commented 3 weeks ago

Thanks, @fitzgen, I think the "cap" alternative had a fair airing and the consensus was clear. A substantial part of my concern about the complexity of the original proposal (and reliance on relocations and wasm-ld) was addressed by #22, assuming that technique prevails, and we can continue that discussion in the appropriate places.

fitzgen commented 1 week ago

Heads up, I'm adding an agenda item to the 2024-07-30 CG meeting to give an update on this discussion and a vote for phase 2: https://github.com/WebAssembly/meetings/pull/1619