Closed lygstate closed 2 years ago
It would allow way less portability of wasm files and data layouts for serialization IMHO and today majors modern platforms are all little endian. I mean I cannot think to anything else than PPC for bigendianness relevance.
I find the choices of having only one endianness really great and we rely on this strongly for the wasm product we are building currently.
PPC is what I want to support:) Cause for VxWorks and other aireplane industry, big endian are the most used version CPU:) And for a lot of existing code, big endian as default are settled. So if wasm support for big endian would be a big win for that.
For example, if I want to use qemu to simulate PPC under litlle endian machine, that would be faster:) Cause we have good WASM Jit
The endian ship has sailed. Welcome to the monoculture!
Lucky for you the PowerPC is designed to swap bytes really efficiently. ;)
It can also shift and mask very fast, since it was designed to efficiently emulate other instruction sets.
https://devblogs.microsoft.com/oldnewthing/20180810-00/?p=99465
I suppose endianness could in some sense be an attribute of the memory object, in the same way as shareability is -- one would only be allowed to use a memory in a module if the endianness attributes of the memory and the module's imported memory are equal.
Non-native-endian access is not always simple. It's true some architectures have load/store-with-reverse-endianness instructions, but do they have ditto atomic accesses? I'm inclined to doubt it. There would have to be a significant amount (not just now but over time) of big-endian-only software that could be compiled to wasm before requiring a big-endian mode would pay off. Like sticking to ieee-conforming floating point, sticking to little-endian simplifies a lot of things for most users and for the implementations.
I suppose endianness could in some sense be an attribute of the memory object
Endianness isn't limited to memory instructions, though. Another example are the reinterpret instructions.
And then there is hardware that is neither little nor big endian, IIRC.
I suppose endianness could in some sense be an attribute of the memory object
Endianness isn't limited to memory instructions, though. Another example are the reinterpret instructions.
I assume that you're referring to some floating point layouts being mixed-endian (eg little-endian within the word but the words in big-endian order)? I haven't seen those in a while; I know I encountered them on some ARM systems but that's over a decade ago.
And then there is hardware that is neither little nor big endian, IIRC.
I don't doubt it, though to my knowledge I've never worked on such a system myself and I don't know any concrete examples.
Even granting both of your objections: solving 99% of use cases instead of just 95% of use cases might be a worthwhile improvement. I'm not exactly advocating doing so, I'm mostly interested in probing the design space.
I think instead of adding an explicit big endian mode to WebAssembly and introducing incompatibility, it would be better to add instructions which make it fast to deal with big endian data. One possibility is big endian load and store operations, but even just something like a single-byte i32.bswap
instruction could allow big endian processing with very little overhead. JITs wouldn't actually have to perform a swap on a big endian architecture: they could fuse it with the memory access instruction next to it and compile it into a native big endian access.
Yep, adding an i32.bswap
was discussed a while back (see FutureFeatures.md) and could be a nice small proposal.
In the interest of probing the design space,i32.bswap
would work for MVP Wasm but if the intention was to support some of the in-progress proposals like Threads and/or SIMD, that might be somewhat more challenging. For example, atomic operations using a byte swap operation only might render the accesses to be non-atomic. In the SIMD proposal, we assume that only one 128-bit type is introduced, and the representations are interchangeable so while doable, adding a BigEndian mode would at minimum require additional byte swap operations, more may be needed to detect different representations and only swap when necessary.
as we mentioned in #1374, it might be interesting to create an LLVM backend that force-swaps every emitted store/load and it'd solve this problem while retaining backwards compatibility. this would be easily detectable by a hypothetical wasm->PPC compiler and retains all guarantees currently made by wasm, at the only expense of being slower on LE platforms. no need to change the binary format or create a new one, just need to change the compiler/LLVM.
It wouldn't just be a change to LLVM; it'd be a new C ABI. The existence of such a wasm C ABI might benefit a small number of people, but it would bubble up into many places throughout the ecosystem, creating extra work and confusion for a lot of people.
it would just be a change to LLVM. the new ABI would be a side-effect of the change more than anything.
it's not like wasm comes with a stdlib...
let it "bubble up", tbh. if you're using that switch, it's on you to make it work. and it should/will work if you compile everything you need with it.
There are libc implementations for wasm. They're widely used.
As an LLVM maintainer, I'm opposed to such a change landing upstream.
what if it just wasn't exposed and you had to go out of your way to enable it?
One of the main goals of wasm is to enable modules that run well on many platforms, however what you're describing is building different modules for different platforms. Also, even with a hidden option, there's a risk that it will grow in scope over time, a risk that people will misinterpret and/or misuse it, a risk that people will point to it as a precedent for adding more such features, and a risk that it could become a maintenance or development burden. Adding a new endianness to an LLVM target involves, among other things, adding a new target triple, and target triples end up getting a fair amount of visibility.
Wasm is a little-endian platform, by design. The LLVM Wasm backend is focused on that.
keep wasm little-endian, add stuff to LLVM to make up for it.
That would create a big-endian ABI, which risks creating a lot of extra work and confusion.
I think other than create a big-endian mode/ABI in Wasm, add enough instruction to make wasm running on big-endian machine don't loose performance is a better option
that's basically #1374 tho and there are things that make it impossible (mainly arrays and unions)
Most portable serialization formats have network byte order (big-endian). I'm expecting significant overhead in my serialization code - I need to swap every size prefix, not just ints itself...
Idea of having i64.bswap
looks really nice for me....
Wasm is little-endian, by design. There are now multiple wasm engines implementing wasm's little-endian semantics on big-endian hosts, and they appear to work well. See #1426 for discussion of a bswap
instruction.
we feel like interface types should support be/le translation. (interface types are shipped with the binary, right? so it's basically "free" (aka slow on the "wrong" platform) as far as libc's and whatnot are concerned?)
Interface types (now the component model) does encapsulate endianness. That said, adding a big-endian mode to Wasm would still have enormous costs and confusion for the ecosystem as a whole.
so we can have an ABI defined entirely by interface types and not have to worry about performance tuning for weird wasm VMs? ^^
You can have an ABI defined entirely by the component model. This doesn't mean you'll never have to worry about performance tuning though.
we feel like that should enable LLVM to use big-endian calling conventions when generating wasm, while the libc implementation itself is still LE, and then it just generates component model stuff to adapt between them?
then a special BE VM can detect that and use a big endian libc and get better performance that way!
The LLVM backend will not be adding big-endian support. Wasm is a little-endian platform, by design. The LLVM Wasm backend is focused on that.
Interface types (now the component model) does encapsulate endianness.
@sunfishcode, do you mean it is endianness-agnostic or can it actually express endianness? I could not find a good reference to that in the repo.
LLVM backend doesn't yet support producing Component Model definitions, also switching the ABI isn't a very simple task.
It's endianness-agnostic. When a component model API passes or returns a value with a type like u32
, it's just an integer value in a particular range, and not a sequence of bytes you can observe. If you store it in linear memory and observe the bytes there, at that point, it's you writing those bytes, and not the component model.
I think I misunderstood the question above. The component model can define ABIs, however that's different from the C ABI that compilers expect to talk to their libc with. The component model does not automatically make it possible to make a big-endian C application on top of a little-endian libc.
well it should, that'd be amazing for using wasm as a weird IR for weird platforms
Is that possible add an BigEndian mode? At least not stop the implementor to implement that. Give a option.