WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.42k stars 694 forks source link

Looking for i64.bswap in future impl #1334

Closed lygstate closed 2 years ago

SamuraiCrow commented 4 years ago

I didn't even see i32 or i16 bswap supported. Is there a newer set of specs and proposed specs to look at?

binji commented 4 years ago

There's some discussion of bswap as a future feature, but there hasn't been anything proposed officially.

SamuraiCrow commented 4 years ago

That's disappointing but not surprising. Thanks for the heads' up.

binji commented 4 years ago

Do you have an example where not having a bswap instruction has a large performance or code size impact? If so, that would be a compelling reason to look into adding it.

lygstate commented 4 years ago

For example, using webassembly emulation PPC or running webassembly on PPC, or Socket htons(), ntohl(), ntohs(),htons() @binji

SamuraiCrow commented 4 years ago

Implementing code on "retro" platforms like the Commodore Amiga and Atari ST models is one such case. Both of these use 68000 through 68060 or better CPUs which, rare as they are, are often incorporated as softcores in FPGAs due to the expense of developing dedicated ASIC designs. On a 68040, for example, the simplest little-endian to big-endian swap takes 3 opcodes: ROR.W D0 SWAP D0 ROR.W D0 The endian swap to support little-endian code on this 40 MHz CPU is immense. This example doesn't take advantage of pipelining at all and the cost penalty of non-pipelined rotate and word-swap operations is even higher on the superscalar 68060 thus requiring endian swap commands of 5 opcodes or more. While the 68080 softcore (not available as ASIC) has a move-with-swap opcode, the memory accesses of the addressing modes of it are still big-endian.

taralx commented 4 years ago

@SamuraiCrow I'm confused. If the underlying platform doesn't have a bswap instruction, how does adding one to webassembly help?

lygstate commented 4 years ago

@taralx lots of platform have bswap instruction https://c9x.me/x86/html/file_module_x86_id_21.html

SamuraiCrow commented 4 years ago

Using bswap as a prefix to a store and a suffix to a load produces an equivalent to big-endian load and store operations. Personally, a big endian load and store for 16, 32 and 64 bits would actually be preferred but having a way to do endian-swaps is necessary for many old platforms that are still supported.

binji commented 4 years ago

Sorry, I think my comment above was a bit confusing. Since bswap-like functionality can currently be implemented in WebAssembly already, we should approach adding a bswap instruction to Wasm like we do with adding a new Wasm SIMD-instruction.

In particular, it would be useful to know how best to lower these instructions for various architectures (at least x64 and ARM), and what the performance difference would be in a real-world benchmark. See for example https://github.com/WebAssembly/simd/pull/128. Many of these do not include a benchmark, but the case for SIMD is a bit different, since it could be shown that these instructions were being used in relevant applications, and not including them would require downshifting to scalar.

SamuraiCrow commented 4 years ago

For starters, https://github.com/michalsc/Emu68 runs in AArch64eb instruction set (eb for big endian mode) due to the fact that it would take additional instructions to implement endian-swaps with vector units in a register-tight environment. Of course it's a JIT to run 68020 code but that shouldn't matter.

SamuraiCrow commented 4 years ago

https://www.felixcloutier.com/x86/movbe is the x64 version of the Move from Big Endian instruction to load and store big-endian modes.

sunfishcode commented 4 years ago

running webassembly on PPC

https://github.com/michalsc/Emu68 runs in AArch64eb instruction set (eb for big endian mode)

Note that even if we add a bswap instruction, most WebAssembly code won't use it, and will still expect little-endian behavior, and these use cases won't be improved.

It would be theoretically possible to build eg. a C compiler that automatically inserts bswap before every store and after every load, to produce a kind of big-endian WebAssembly which runs more efficiently on big-endian hosts. However, this would effectively create a new C ABI, which, if properly supported, would bubble up through a lot of tools, libraries, and ecosystems, creating a lot of extra work for a lot of people who don't otherwise need this ability. I myself would be opposed to adding a bswap to WebAssembly if these use cases are part of the motivation for it.

lygstate commented 4 years ago

running webassembly on PPC

https://github.com/michalsc/Emu68 runs in AArch64eb instruction set (eb for big endian mode)

Note that even if we add a bswap instruction, most WebAssembly code won't use it, and will still expect little-endian behavior, and these use cases won't be improved.

It would be theoretically possible to build eg. a C compiler that automatically inserts bswap before every store and after every load, to produce a kind of big-endian WebAssembly which runs more efficiently on big-endian hosts. However, this would effectively create a new C ABI, which, if properly supported, would bubble up through a lot of tools, libraries, and ecosystems, creating a lot of extra work for a lot of people who don't otherwise need this ability. I myself would be opposed to adding a bswap to WebAssembly if these use cases are part of the motivation for it.

I am getting confused, there is no need toolchain support, only need webassembly can lowering bswap down into native CPU instruction. Think WebAssembly as a IR

binji commented 4 years ago

I'm a little confused here too. WebAssembly is little-endian, by design. I thought we were talking about adding bswap to make it faster to run an emulator for a big-endian machine. I think that's OK, and potentially a good reason to add the instruction.

If instead we're talking about making a new big-endian WebAssembly (with a new ABI), I'm also opposed to that idea.

SamuraiCrow commented 4 years ago

I agree with lygstate and binji. If WebAssembly were only going to support 3 operating systems and 2 processor architectures, there wouldn't be any point in making it cross-platform. Emulation is a thing too.

If somebody wants to make their own OS or processor architecture, WebAssembly should allow it to happen. That's why it's a standard, not a product. If the native code of that OS is big-endian, of course a little extra custom-lowering will be necessary but that falls on the OS and browser developers to implement it in that case. That doesn't mean that the practice should be disallowed when using WebAssembly outside the browser either. All software will be predominately little-endian and adding bswap is not going to change that.

SamuraiCrow commented 4 years ago

In addition, I've got a few more use-cases for you. Old file formats and packet formats sometimes used the "network endian" (aka big-endian) architecture. All the little-endian usage in the world is going to make AIFF audio into a little-endian format. Of course you could use Wave files in their place but batch conversion takes time too.

sunfishcode commented 4 years ago

In my post above, I quoted two use cases from earlier posts which seem to want wasm producer toolchain support and a new big-endian ABI. I don't want a new big-endian ABI for WebAssembly, and it's not clear to me so far that this isn't one of the goals here.

lygstate commented 4 years ago

In my post above, I quoted two use cases from earlier posts which seem to want wasm producer toolchain support and a new big-endian ABI. I don't want a new big-endian ABI for WebAssembly, and it's not clear to me so far that this isn't one of the goals here. I am sorry for conusing you, I am not talking about toolchain support, I am just demo a example there is big endian machine. Not request for toolchain support

SamuraiCrow commented 4 years ago

I'm talking about old file formats. Certainly not breaking compatibility with the current ABI. That would defeat the purpose of having a bytecode.

sunfishcode commented 4 years ago

Ok, cool. So to be sure, a wasm bswap instruction wouldn't help with running WebAssembly on a ppc or a 68000-series CPU, and wouldn't help porting code written with the assumption it's running on aarch64be.

lygstate commented 4 years ago

Ok, cool. So to be sure, a wasm bswap instruction wouldn't help with running WebAssembly on a ppc or a 68000-series CPU, and wouldn't help porting code written with the assumption it's running on aarch64be.

yes, you are right, bswap is something like simd to improve performance

SamuraiCrow commented 4 years ago

After careful consideration, I've decided to make my own bytecode rather than using an off-the-shelf bytecode that claims to be cross-platform but isn't.

SamuraiCrow commented 4 years ago

Issue #1212 would solve this.

SoniEx2 commented 4 years ago

fwiw, we don't actually need big endian. or bswap.

SamuraiCrow commented 4 years ago

Half of the computers I own use big endianness. I seldom use up-to-date machines so I'll not be using WebAssembly in its current form.

SoniEx2 commented 4 years ago

That's fine. We'll make you use it. :)

SamuraiCrow commented 4 years ago

Most of my computers don't have an up-to-date web browser. How will you "make" me use it?

SoniEx2 commented 4 years ago

with a compiler :p

SamuraiCrow commented 4 years ago

Not in its current form. I'll have to fix it up first. ;-)

SoniEx2 commented 4 years ago

how?

SamuraiCrow commented 4 years ago

Obi Wan voice: Use the source. Let it guide your actions.

SoniEx2 commented 4 years ago

well, regardless, we'll make it work.

sunfishcode commented 2 years ago

Closing this in favor of https://github.com/WebAssembly/design/issues/1426, which also tracks adding a bswap and has more detail.