Deprecate wasm-c-api on big-endian hosts

SoniEx2 commented 1 year ago

See also #156, #164, WebAssembly/wabt#1972 and WebAssembly/wabt#2161.

Yes, linear memory as seen from wasm is always little-endian. But how should the host see it? wasm-c-api currently assumes the host is little-endian too.

rossberg commented 1 year ago

The host sees Wasm's raw array of bytes, through the byte* pointer. If it reads or writes it after casting that to larger native pointer types, it will naturally apply its native endianness. Is there any wiggle room at all here? How would you even specify or implement a different behaviour than that?

We could add a family of read/write instructions that adjust for endianness, as suggested in #164. On the other hand, such functionality is mostly independent of this API. For example, you could readily use the endianness-aware types from the boost library to handle this.

SoniEx2 commented 1 year ago

The thing is wabt currently uses "big-endian memory" to improve performance on some older platforms.

Instead of storing memory as, say,

04 03 02 01 00 00 00 00 ...

as it would on a little-endian host, it stores it as

... 00 00 00 00 01 02 03 04

in other words the entire memory array is byteswapped. This avoids individual byteswaps in loads and stores, at the cost of making memory grows slightly more expensive. Aside from this, memory addressing is as efficient as it's always been: it's just relative addressing after all. It's just that instead of reading an u32 from memory_start[0..4], it reads it from memory_end[-4..0].

The cost of byte-swapping an u32 on something like a Motorola 68000 is... a lot. both in code size and performance. and the worst part is that it cannot be optimized away by the compiler: the compiler must always do the byteswaps when reading and writing to memory. Meanwhile, with the relative but backwards addressing trick, we can even use DBcc, the "test condition, decrement, and branch" instruction, where we are able to turn forward (incrementing) iterators into DBcc iterators, for a possibly pretty significant improvement.

It's worth noting that host code would have to be adapted for the big-endian host either way, since wasm itself is little-endian. There just happen to be 2 ways of doing said adapting.

rossberg commented 1 year ago

So it stores the entire memory in reverse. Interesting. I can see why, but that clearly breaks the memory API, regardless of whether we add auxiliary functions. So I am not really sure what could be done about that?

SoniEx2 commented 1 year ago

The current wasm-c-api is designed with LE hosts in mind. What could be done is to specify the behaviour on BE hosts. Specifically, one of:

Accessing linear memory, particularly when it comes to larger-than-byte integer values, must byte-swap the values on BE platforms, but the addresses count up from the start of linear memory.
Accessing linear memory, particularly when it comes to structs and arrays, must order-swap the fields on BE platforms, and the addresses count down from the end of linear memory, corrected for size.

(It's also possible to spec both, with a #define. Big-endian hosts are somewhat uncommon, so having 3 wasm-c-apis: native LE, byteswap-values BE and byteswap-memory BE, is probably not as big of a deal as trying to introduce BE wasm.)

rossberg commented 1 year ago

As far as I'm concerned, it's 1 right now. The API merely gives you a pointer to the memory. That contains a sequence of random bytes. There is no particular interpretation inherent in these bytes. The Wasm code is likely to make certain assumptions, and the host code better matches those. But that's purely a contract between the two, unrelated to the API (and endianness is only a small part of it). If the host reads/writes multiple bytes with the intention of them representing numbers, it is its responsibility to do so in a manner that matches that contract (which is likely based on LE). Something like the boost library can help with making respective accesses agnostic to native host endianness.

As for 2, reinterpreting the memory pointer in a way that requires negative indexing would technically be possible, I suppose, but it doesn't seem very natural or desirable to me. It's like leaking an implementation detail of the engine. Also note that it breaks the use of array indexing into the memory, at least on a 32-bit architecture, where offsets need to be unsigned ints, but the use of p[-i] would involve a lossy conversion to signed, so produces undefined behaviour for offsets larger than MAX_INT. (I wouldn't be surprised if wabt had this problem already?)

SoniEx2 commented 1 year ago

nah we use p[memsize-offset-readsize] (e.g. p[memsize-offset-1] for bytes, p[memsize-offset-4] for u32, etc) and let the compiler figure out how to optimize it. not "true" negative indexing, just simple pointer math.

either way, the thing is, there is a technical benefit to doing it this way, and it'd be nice if the API could spec it. while at it, having a "high level" memory API, where the consumer asks for bytes/u32's/etc at offsets and letting the engine handle the reads and writes, (e.g. read_u32(memory, offset) -> u32 as in #164, and the engine then does memcpy(&u32, p+memsize-offset-4, 4) internally) would allow abstracting away the raw memory layout, essentially exposing the wasm load/store opcodes as standalone C functions.

SoniEx2 commented 1 year ago

can we have a standardized WASM_FLIP_MEMORY for wasm-c-api?

rossberg commented 1 year ago

You'll have to be a bit more precise. ;)

SoniEx2 commented 1 year ago

defined like so: on big endian platforms, when compiled with WASM_FLIP_MEMORY, the host (wasm-c-api consumer) must access memory with an offset based on memsize, specifically memory[memsize-wasmptr-readsize], where memory is the memory to be accessed, wasmptr is the pointer value in the wasm universe (i.e. the memory offset within the wasm memory) and readsize is the size of the value to be read (e.g. 4 for an u32, 1 for an u8, etc).

for example, wabt would require this mode to be used, because this is how wabt implements linear memory for performance.

we personally don't feel like wasm-c-api should hide platform-specific issues (like endianess), but instead figure out the optimal way to handle those within the API. wasm-c-api's current approach with linear memory (linear memory must be represented in little-endian order at all times) is in some cases detrimental to performance.

rossberg commented 1 year ago

If I understand your suggestion correctly, then it appears to be outside the scope of the current C API. It's not a flag we can simply introduce in the API, it implies a different mode of operation and code generation for engines. Nor can users just toggle it in the API, it may require recompiling the engine itself in a different configuration. And I'd expect that implementing all the new codegen in jits would be a substantial undertaking for most engines, with non-trivial implications. If you want to ask all engines to invest in that, then I encourage you to present this to the CG as an actual proposal.

SoniEx2 commented 1 year ago

we want the engine to require a specific compile-time configuration, but an engine isn't required to support both modes. in fact most engines should only support one mode. wasm-c-api consumers should support both modes if they care about big-endian support.

this is the only thing blocking webassembly/wabt#2161

(but anyway, how would we present this to the CG as an actual proposal?)

rossberg commented 1 year ago

See https://github.com/WebAssembly/meetings/blob/main/process/proposal.md

WebAssembly / wasm-c-api

Deprecate wasm-c-api on big-endian hosts #180