Inefficient x64 codegen for fabs/fneg

WebAssembly / simd

Branch of the spec repo scoped to discussion of SIMD in WebAssembly

Other

531 stars 43 forks source link

Inefficient x64 codegen for fabs/fneg #187

Open abrown opened 4 years ago

abrown commented 4 years ago

In both cranelift (ignore the bitcasts) and v8, floating-point absolute value and floating-point negation are 3-instruction lowerings. I don't believe there is any better lowering than these (is there?) for this type of instruction, but I opened this issue to discuss the inefficiency: I would guess @dtig would say that these are in the class of useful instructions to have (and I agree) but predicting performance across all platforms might be more transparent if Emscripten, e.g., generated Wasm without these instructions. In other words, if Emscripten generated a sequence like const + shift + and instead of fabs it would be more obvious what the cost would be and Wasm compilers could still use a peephole optimization to convert const + shift + and to a single instruction on platforms that support this.

tlively commented 4 years ago

I'm not sure the benefit of having wasm instruction counts match native instruction counts is worth all the the extra work of making engines pattern match instruction sequences. In general, the higher-level the operation, the more information is available to the engine to enable it to make the best codegen choices possible. It is better to have an operation that takes three instructions to implement than to replace that operation everywhere with three operations if that would make codegen harder for some platform.

jan-wassenberg commented 4 years ago

FYI AVX-512 and NEON have fabs() instructions.

zeux commented 4 years ago

(sorry, I know you're all tired of me and my "constants are awesome" comments)

fabsf/fneg are single instruction (two uops) with a memory load. You only need a single constant for all of these, which has a high bit set and low bits unset (sign mask).

fneg(v) = xor(v, -0.f)
fabs(v) = andnot(v, -0.f)

So this adds at most one 16b constant to the function that needs to perform this math. -0.f is amazingly useful in other cases, often it can be present in the user code as well to do operations like copysign.

zeux commented 4 years ago

If you don’t like constants, alternative lowering options:

fneg: xorps (to produce zero) + subps; 2 insn if you can’t find a ready to use zero register, 1 if you can fabs: fneg + maxps = 3 insn (2 if you already have 0 reg).

I am not sure of the exact NaN and negative zero semantics that the spec mandates for these so this may or may not be compliant.

ngzhian commented 4 years ago

Unfortunately, the alternative lowering options won't work since fneg and fabs are specced as purely binary operations, see https://webassembly.github.io/spec/core/exec/numerics.html#op-fneg so the nans need to be preserved.