Closed lygstate closed 2 years ago
There's some discussion of bswap as a future feature, but there hasn't been anything proposed officially.
That's disappointing but not surprising. Thanks for the heads' up.
Do you have an example where not having a bswap
instruction has a large performance or code size impact? If so, that would be a compelling reason to look into adding it.
For example, using webassembly emulation PPC or running webassembly on PPC, or Socket htons(), ntohl(), ntohs(),htons() @binji
Implementing code on "retro" platforms like the Commodore Amiga and Atari ST models is one such case. Both of these use 68000 through 68060 or better CPUs which, rare as they are, are often incorporated as softcores in FPGAs due to the expense of developing dedicated ASIC designs. On a 68040, for example, the simplest little-endian to big-endian swap takes 3 opcodes: ROR.W D0 SWAP D0 ROR.W D0 The endian swap to support little-endian code on this 40 MHz CPU is immense. This example doesn't take advantage of pipelining at all and the cost penalty of non-pipelined rotate and word-swap operations is even higher on the superscalar 68060 thus requiring endian swap commands of 5 opcodes or more. While the 68080 softcore (not available as ASIC) has a move-with-swap opcode, the memory accesses of the addressing modes of it are still big-endian.
@SamuraiCrow I'm confused. If the underlying platform doesn't have a bswap instruction, how does adding one to webassembly help?
@taralx lots of platform have bswap instruction https://c9x.me/x86/html/file_module_x86_id_21.html
Using bswap as a prefix to a store and a suffix to a load produces an equivalent to big-endian load and store operations. Personally, a big endian load and store for 16, 32 and 64 bits would actually be preferred but having a way to do endian-swaps is necessary for many old platforms that are still supported.
Sorry, I think my comment above was a bit confusing. Since bswap
-like functionality can currently be implemented in WebAssembly already, we should approach adding a bswap
instruction to Wasm like we do with adding a new Wasm SIMD-instruction.
In particular, it would be useful to know how best to lower these instructions for various architectures (at least x64 and ARM), and what the performance difference would be in a real-world benchmark. See for example https://github.com/WebAssembly/simd/pull/128. Many of these do not include a benchmark, but the case for SIMD is a bit different, since it could be shown that these instructions were being used in relevant applications, and not including them would require downshifting to scalar.
For starters, https://github.com/michalsc/Emu68 runs in AArch64eb instruction set (eb for big endian mode) due to the fact that it would take additional instructions to implement endian-swaps with vector units in a register-tight environment. Of course it's a JIT to run 68020 code but that shouldn't matter.
https://www.felixcloutier.com/x86/movbe is the x64 version of the Move from Big Endian instruction to load and store big-endian modes.
running webassembly on PPC
https://github.com/michalsc/Emu68 runs in AArch64eb instruction set (eb for big endian mode)
Note that even if we add a bswap
instruction, most WebAssembly code won't use it, and will still expect little-endian behavior, and these use cases won't be improved.
It would be theoretically possible to build eg. a C compiler that automatically inserts bswap
before every store and after every load, to produce a kind of big-endian WebAssembly which runs more efficiently on big-endian hosts. However, this would effectively create a new C ABI, which, if properly supported, would bubble up through a lot of tools, libraries, and ecosystems, creating a lot of extra work for a lot of people who don't otherwise need this ability. I myself would be opposed to adding a bswap
to WebAssembly if these use cases are part of the motivation for it.
running webassembly on PPC
https://github.com/michalsc/Emu68 runs in AArch64eb instruction set (eb for big endian mode)
Note that even if we add a
bswap
instruction, most WebAssembly code won't use it, and will still expect little-endian behavior, and these use cases won't be improved.It would be theoretically possible to build eg. a C compiler that automatically inserts
bswap
before every store and after every load, to produce a kind of big-endian WebAssembly which runs more efficiently on big-endian hosts. However, this would effectively create a new C ABI, which, if properly supported, would bubble up through a lot of tools, libraries, and ecosystems, creating a lot of extra work for a lot of people who don't otherwise need this ability. I myself would be opposed to adding abswap
to WebAssembly if these use cases are part of the motivation for it.
I am getting confused, there is no need toolchain support, only need webassembly can lowering bswap down into native CPU instruction. Think WebAssembly as a IR
I'm a little confused here too. WebAssembly is little-endian, by design. I thought we were talking about adding bswap
to make it faster to run an emulator for a big-endian machine. I think that's OK, and potentially a good reason to add the instruction.
If instead we're talking about making a new big-endian WebAssembly (with a new ABI), I'm also opposed to that idea.
I agree with lygstate and binji. If WebAssembly were only going to support 3 operating systems and 2 processor architectures, there wouldn't be any point in making it cross-platform. Emulation is a thing too.
If somebody wants to make their own OS or processor architecture, WebAssembly should allow it to happen. That's why it's a standard, not a product. If the native code of that OS is big-endian, of course a little extra custom-lowering will be necessary but that falls on the OS and browser developers to implement it in that case. That doesn't mean that the practice should be disallowed when using WebAssembly outside the browser either. All software will be predominately little-endian and adding bswap is not going to change that.
In addition, I've got a few more use-cases for you. Old file formats and packet formats sometimes used the "network endian" (aka big-endian) architecture. All the little-endian usage in the world is going to make AIFF audio into a little-endian format. Of course you could use Wave files in their place but batch conversion takes time too.
In my post above, I quoted two use cases from earlier posts which seem to want wasm producer toolchain support and a new big-endian ABI. I don't want a new big-endian ABI for WebAssembly, and it's not clear to me so far that this isn't one of the goals here.
In my post above, I quoted two use cases from earlier posts which seem to want wasm producer toolchain support and a new big-endian ABI. I don't want a new big-endian ABI for WebAssembly, and it's not clear to me so far that this isn't one of the goals here. I am sorry for conusing you, I am not talking about toolchain support, I am just demo a example there is big endian machine. Not request for toolchain support
I'm talking about old file formats. Certainly not breaking compatibility with the current ABI. That would defeat the purpose of having a bytecode.
Ok, cool. So to be sure, a wasm bswap
instruction wouldn't help with running WebAssembly on a ppc or a 68000-series CPU, and wouldn't help porting code written with the assumption it's running on aarch64be.
Ok, cool. So to be sure, a wasm
bswap
instruction wouldn't help with running WebAssembly on a ppc or a 68000-series CPU, and wouldn't help porting code written with the assumption it's running on aarch64be.
yes, you are right, bswap is something like simd to improve performance
After careful consideration, I've decided to make my own bytecode rather than using an off-the-shelf bytecode that claims to be cross-platform but isn't.
Issue #1212 would solve this.
fwiw, we don't actually need big endian. or bswap.
Half of the computers I own use big endianness. I seldom use up-to-date machines so I'll not be using WebAssembly in its current form.
That's fine. We'll make you use it. :)
Most of my computers don't have an up-to-date web browser. How will you "make" me use it?
with a compiler :p
Not in its current form. I'll have to fix it up first. ;-)
how?
Obi Wan voice: Use the source. Let it guide your actions.
well, regardless, we'll make it work.
Closing this in favor of https://github.com/WebAssembly/design/issues/1426, which also tracks adding a bswap and has more detail.
I didn't even see i32 or i16 bswap supported. Is there a newer set of specs and proposed specs to look at?