WebAssembly / WASI

WebAssembly System Interface
Other
4.85k stars 251 forks source link

document endianness #449

Closed vapier closed 2 years ago

vapier commented 2 years ago

i get that WASI is built on top of WASM and thus it can be easy to "just know" that WASI is obviously little endian, but for the sake of clarity, it might help to explicitly state this in the API docs. especially when one considers that there have been cases of interopt using a specific endian all the time (e.g. "network byte order" is always big endian).

sunfishcode commented 2 years ago

Is there a particular place or particular functions where it would be helpful to document this? WASI follows the endianness conventions that all little-endian platforms follow, so it's not immediately clear where we should document this.

Also, it's worth noting that interface types are endian-independent, so as WASI transitions to those, the API specifications will be endian-independent, and any endianness sensitivity will be a result of a specific binding layer.

vapier commented 2 years ago

the word "endian" does not appear anywhere in the WASI spec. the first section is "types", so seems like putting a section on endian first would work.

unless WASI is aspiring to grow beyond WASM, it's always going to be little endian.

sunfishcode commented 2 years ago

Interface types doesn't expose the storage of the values, so it doesn't expose endianness. This is the direction that WASI is evolving, and as such, it's convenient to avoid having documentation talk about endianness unless there's a specific need for it.

The x86_64 psabi document, for example, doesn't say the word "endian" anywhere either, except in the layout of __int128 which isn't a register type. Is there something in WASI's documentation that gives the impression that something might not be little-endian right now, that would be helpful to clarify?

vapier commented 2 years ago

i'm not sure why you're resisting writing clear specifications. i filed this bug because i had people ask about it. they read the spec while reviewing code and couldn't find the answer.

referring to other specs that are ambiguous isn't really a good argument. i'll note that the AMD64 psABI states that it only uses ELFDATA2LSB for ELF objects which is little endian encoding, and it says " These values use the same byte order as other word values in the AMD64 architecture" while failing to define that byte order in the data representation section.

anyone implementing WASI needs to know what endianness these interfaces are using. for values passed as immediate values (i.e. function arguments), it's not terribly relevant as it's probably reasonable to assume one doesn't have to do byte swapping on registers (or equiv), but WASI also defines pointers to data structures & multi-byte words in memory. anyone working on either side of the boundary needs to know what endianness those are supposed to be. a naive memcpy(memory_buffer, &integer, 4) isn't portable.

something in WASI's documentation that gives the impression that something might not be little-endian right now

where in the documentation is there any clue that it's little endian and not big endian ? or XOR endian or network endian or host endian or PDP endian or some other endian ?

i'll point out that network interfaces have a long history of always being big endian (i.e. "network endian") precisely so that peers don't have to negotiate if their CPUs are using different endianness.

sunfishcode commented 2 years ago

My thought was to try to uncover a possible root cause for confusion, rather than focus on what might turn out to be a symptom.

Also, as I mentioned above, a high-level direction for us is to move away from raw pointers and endianness, at the specification level. I'm happy to mention endianness if there are specific things that are confusing. And of course we'll mention endianness if we add APIs that expose network byte order (as other little-endian platforms do). However in absence of specific needs, it's convenient to treat endianness as a property of the bindings we're currently using, rather than something that the WASI APIs themselves need to document, so that we can more easily migrate to different kinds of bindings, including bindings that don't expose endianness at all.

vapier commented 2 years ago

who do you see as the target audience of the WASI spec ? is it application programmers (i.e. people writing "hello world"), or language bindings implementers, or runtime implementers ?

if application programmers need to read this spec, then we have failed them. they should never need to peek under the hood here. the only thing they need is a POSIX compiler & environment. which is what wasi-sdk does now fairly well.

people working on language bindings & runtimes very much need to know these details. no level of abstraction at the API level changes that. the whole point of WASI is to connect completely unrelated runtimes and still have things Just Work. we're never going to get away from raw memory access (like we have with pointers now) which means these details need to be defined precisely.

if we do somehow manage to make details like endianness irrelevant years in the future, it's pretty trivial to just delete such sections & discussions from the spec. but i don't see how that aspiration is relevant now. the WASI API is steeped up to its eyes in multibyte integers with no explanation as to its encoding, and it's doing a disservice leaving things ambiguous. i still don't see why you think it's reasonable that everyone should naturally assume everything is little endian. there is nothing in the spec to suggest that. assuming host cpu endianness seems like a more natural default assumption.

sbc100 commented 2 years ago

I tend to agree that we should not avoid documenting how things work today (as in wasi_snapshot_preview1) because we have aspirations to side step certain issues in the future.

We do, after all, document the requirement to export the wasm memory, even though we hope to avoid that one day too.

linclark commented 2 years ago

I tend to agree that we should not avoid documenting how things work today (as in wasi_snapshot_preview1) because we have aspirations to side step certain issues in the future.

We're very close to making the switch to using Interface Types based on Canonical ABI, as Alex has demonstrated in recent meetings.

With that, it feels like endianness should be documented at the Canonical ABI level, rather than in WASI itself. That could happen in the Interface Types repo as soon as https://github.com/WebAssembly/interface-types/pull/132 lands

sunfishcode commented 2 years ago

The endianness of the Canonical ABI is now documented as "little".

As seen in these links, the ABI documentation is already greatly improved for Preview2. In addition to endianness, it has full ABI documentation. Preview1's documentation isn't anywhere near this complete, and wouldn't be enough for someone to build an implementation on, even if we added endianness. So at this point, I think it makes sense to focus on Preview2 as the direction of the platform going forward.