WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.42k stars 694 forks source link

Currently, webasm are only have le(LittleEndian) mode. #1212

Closed lygstate closed 2 years ago

lygstate commented 6 years ago

Is that possible add an BigEndian mode? At least not stop the implementor to implement that. Give a option.

daminetreg commented 6 years ago

It would allow way less portability of wasm files and data layouts for serialization IMHO and today majors modern platforms are all little endian. I mean I cannot think to anything else than PPC for bigendianness relevance.

I find the choices of having only one endianness really great and we rely on this strongly for the wasm product we are building currently.

lygstate commented 6 years ago

PPC is what I want to support:) Cause for VxWorks and other aireplane industry, big endian are the most used version CPU:) And for a lot of existing code, big endian as default are settled. So if wasm support for big endian would be a big win for that.

lygstate commented 6 years ago

For example, if I want to use qemu to simulate PPC under litlle endian machine, that would be faster:) Cause we have good WASM Jit

SimHacker commented 5 years ago

The endian ship has sailed. Welcome to the monoculture!

Lucky for you the PowerPC is designed to swap bytes really efficiently. ;)

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_lwbrx_lbx_lwbri_instrs.htm

It can also shift and mask very fast, since it was designed to efficiently emulate other instruction sets.

https://devblogs.microsoft.com/oldnewthing/20180810-00/?p=99465

lars-t-hansen commented 5 years ago

I suppose endianness could in some sense be an attribute of the memory object, in the same way as shareability is -- one would only be allowed to use a memory in a module if the endianness attributes of the memory and the module's imported memory are equal.

Non-native-endian access is not always simple. It's true some architectures have load/store-with-reverse-endianness instructions, but do they have ditto atomic accesses? I'm inclined to doubt it. There would have to be a significant amount (not just now but over time) of big-endian-only software that could be compiled to wasm before requiring a big-endian mode would pay off. Like sticking to ieee-conforming floating point, sticking to little-endian simplifies a lot of things for most users and for the implementations.

rossberg commented 5 years ago

I suppose endianness could in some sense be an attribute of the memory object

Endianness isn't limited to memory instructions, though. Another example are the reinterpret instructions.

And then there is hardware that is neither little nor big endian, IIRC.

lars-t-hansen commented 5 years ago

I suppose endianness could in some sense be an attribute of the memory object

Endianness isn't limited to memory instructions, though. Another example are the reinterpret instructions.

I assume that you're referring to some floating point layouts being mixed-endian (eg little-endian within the word but the words in big-endian order)? I haven't seen those in a while; I know I encountered them on some ARM systems but that's over a decade ago.

And then there is hardware that is neither little nor big endian, IIRC.

I don't doubt it, though to my knowledge I've never worked on such a system myself and I don't know any concrete examples.

Even granting both of your objections: solving 99% of use cases instead of just 95% of use cases might be a worthwhile improvement. I'm not exactly advocating doing so, I'm mostly interested in probing the design space.

Serentty commented 4 years ago

I think instead of adding an explicit big endian mode to WebAssembly and introducing incompatibility, it would be better to add instructions which make it fast to deal with big endian data. One possibility is big endian load and store operations, but even just something like a single-byte i32.bswap instruction could allow big endian processing with very little overhead. JITs wouldn't actually have to perform a swap on a big endian architecture: they could fuse it with the memory access instruction next to it and compile it into a native big endian access.

binji commented 4 years ago

Yep, adding an i32.bswap was discussed a while back (see FutureFeatures.md) and could be a nice small proposal.

dtig commented 4 years ago

In the interest of probing the design space,i32.bswap would work for MVP Wasm but if the intention was to support some of the in-progress proposals like Threads and/or SIMD, that might be somewhat more challenging. For example, atomic operations using a byte swap operation only might render the accesses to be non-atomic. In the SIMD proposal, we assume that only one 128-bit type is introduced, and the representations are interchangeable so while doable, adding a BigEndian mode would at minimum require additional byte swap operations, more may be needed to detect different representations and only swap when necessary.

SoniEx2 commented 4 years ago

as we mentioned in #1374, it might be interesting to create an LLVM backend that force-swaps every emitted store/load and it'd solve this problem while retaining backwards compatibility. this would be easily detectable by a hypothetical wasm->PPC compiler and retains all guarantees currently made by wasm, at the only expense of being slower on LE platforms. no need to change the binary format or create a new one, just need to change the compiler/LLVM.

sunfishcode commented 4 years ago

It wouldn't just be a change to LLVM; it'd be a new C ABI. The existence of such a wasm C ABI might benefit a small number of people, but it would bubble up into many places throughout the ecosystem, creating extra work and confusion for a lot of people.

SoniEx2 commented 4 years ago

it would just be a change to LLVM. the new ABI would be a side-effect of the change more than anything.

it's not like wasm comes with a stdlib...

let it "bubble up", tbh. if you're using that switch, it's on you to make it work. and it should/will work if you compile everything you need with it.

sunfishcode commented 4 years ago

There are libc implementations for wasm. They're widely used.

As an LLVM maintainer, I'm opposed to such a change landing upstream.

SoniEx2 commented 4 years ago

what if it just wasn't exposed and you had to go out of your way to enable it?

sunfishcode commented 4 years ago

One of the main goals of wasm is to enable modules that run well on many platforms, however what you're describing is building different modules for different platforms. Also, even with a hidden option, there's a risk that it will grow in scope over time, a risk that people will misinterpret and/or misuse it, a risk that people will point to it as a precedent for adding more such features, and a risk that it could become a maintenance or development burden. Adding a new endianness to an LLVM target involves, among other things, adding a new target triple, and target triples end up getting a fair amount of visibility.

Wasm is a little-endian platform, by design. The LLVM Wasm backend is focused on that.

SoniEx2 commented 4 years ago

keep wasm little-endian, add stuff to LLVM to make up for it.

sunfishcode commented 4 years ago

That would create a big-endian ABI, which risks creating a lot of extra work and confusion.

lygstate commented 4 years ago

I think other than create a big-endian mode/ABI in Wasm, add enough instruction to make wasm running on big-endian machine don't loose performance is a better option

SoniEx2 commented 4 years ago

that's basically #1374 tho and there are things that make it impossible (mainly arrays and unions)

ppmag commented 3 years ago

Most portable serialization formats have network byte order (big-endian). I'm expecting significant overhead in my serialization code - I need to swap every size prefix, not just ints itself...

Idea of having i64.bswap looks really nice for me....

sunfishcode commented 2 years ago

Wasm is little-endian, by design. There are now multiple wasm engines implementing wasm's little-endian semantics on big-endian hosts, and they appear to work well. See #1426 for discussion of a bswap instruction.

SoniEx2 commented 2 years ago

we feel like interface types should support be/le translation. (interface types are shipped with the binary, right? so it's basically "free" (aka slow on the "wrong" platform) as far as libc's and whatnot are concerned?)

sunfishcode commented 2 years ago

Interface types (now the component model) does encapsulate endianness. That said, adding a big-endian mode to Wasm would still have enormous costs and confusion for the ecosystem as a whole.

SoniEx2 commented 2 years ago

so we can have an ABI defined entirely by interface types and not have to worry about performance tuning for weird wasm VMs? ^^

sunfishcode commented 2 years ago

You can have an ABI defined entirely by the component model. This doesn't mean you'll never have to worry about performance tuning though.

SoniEx2 commented 2 years ago

we feel like that should enable LLVM to use big-endian calling conventions when generating wasm, while the libc implementation itself is still LE, and then it just generates component model stuff to adapt between them?

then a special BE VM can detect that and use a big endian libc and get better performance that way!

sunfishcode commented 2 years ago

The LLVM backend will not be adding big-endian support. Wasm is a little-endian platform, by design. The LLVM Wasm backend is focused on that.

penzn commented 2 years ago

Interface types (now the component model) does encapsulate endianness.

@sunfishcode, do you mean it is endianness-agnostic or can it actually express endianness? I could not find a good reference to that in the repo.

LLVM backend doesn't yet support producing Component Model definitions, also switching the ABI isn't a very simple task.

sunfishcode commented 2 years ago

It's endianness-agnostic. When a component model API passes or returns a value with a type like u32, it's just an integer value in a particular range, and not a sequence of bytes you can observe. If you store it in linear memory and observe the bytes there, at that point, it's you writing those bytes, and not the component model.

I think I misunderstood the question above. The component model can define ABIs, however that's different from the C ABI that compilers expect to talk to their libc with. The component model does not automatically make it possible to make a big-endian C application on top of a little-endian libc.

SoniEx2 commented 2 years ago

well it should, that'd be amazing for using wasm as a weird IR for weird platforms