Open Quuxplusone opened 3 years ago
The problem is in SROA, #47271.
There is nothing in libc++ to fix, likely
libcxx/libstdc++ developers can maybe use __builtin_assume to fix lack of compiler knowledge to optimize this optimally...
(In reply to David Bolvansky from comment #2)
> libcxx/libstdc++ developers can maybe use __builtin_assume to fix lack of
> compiler knowledge to optimize this optimally...
Could you please expand on that?
The standard says it's undefined behaviour if hi < lo but the compiler doesn't know that.
I've fixed the weird ugliness of the needless control flow in
077bff39d46364035a5dcfa32fc69910ad0975d0.
We now end up with:
.text
.file "test.cpp"
.globl _Z8stdclampfff # -- Begin function _Z8stdclampfff
.p2align 4, 0x90
.type _Z8stdclampfff,@function
_Z8stdclampfff: # @_Z8stdclampfff
.cfi_startproc
# %bb.0: # %entry
vminss %xmm0, %xmm2, %xmm2
vcmpltss %xmm1, %xmm0, %xmm0
vblendvps %xmm0, %xmm1, %xmm2, %xmm0
retq
.Lfunc_end0:
.size _Z8stdclampfff, .Lfunc_end0-_Z8stdclampfff
.cfi_endproc
# -- End function
.globl _Z7myclampfff # -- Begin function _Z7myclampfff
.p2align 4, 0x90
.type _Z7myclampfff,@function
_Z7myclampfff: # @_Z7myclampfff
.cfi_startproc
# %bb.0: # %entry
vmaxss %xmm0, %xmm1, %xmm0
vminss %xmm0, %xmm2, %xmm0
retq
.Lfunc_end1:
.size _Z7myclampfff, .Lfunc_end1-_Z7myclampfff
.cfi_endproc
# -- End function
.section ".linker-options","e",@llvm_linker_options
.ident "clang version 13.0.0 (git@github.com:LebedevRI/llvm-project.git fc530b5e5056976c6d029e67e70fd22191afbf98)"
.section ".note.GNU-stack","",@progbits
.addrsig
Not really sure if we can get to vmaxss from vcmpltss+vblendvps.
There are probably multiple bugs here:
https://godbolt.org/z/asdavqGEM
We don't/can't propagate 'fast' to the 'select' instructions. See bug 35607.
On the 'myclamp' example, we do manage to get 'fast' on the selects, but we
fail to canonicalize to llvm.maxnum/minnum because the operands are reversed
from what we match with "m_OrdFMax/Min".
On the std::clamp example, the max op is split by the min op, so that requires
more pattern-matching to detect.
std::clamp
generates poor assembly compared to hand-written counterpart for primitive types likefloat
, even with-Ofast -ffast-math
:stdclamp(float, float, float): # @stdclamp(float, float, float) movss dword ptr [rsp - 4], xmm0 movss dword ptr [rsp - 8], xmm1 movss dword ptr [rsp - 12], xmm2 ucomiss xmm2, xmm0 lea rax, [rsp - 12] lea rcx, [rsp - 4] cmovb rcx, rax ucomiss xmm0, xmm1 lea rax, [rsp - 8] cmovae rax, rcx movss xmm0, dword ptr [rax] # xmm0 = mem[0],zero,zero,zero ret
myclamp(float, float, float): # @myclamp(float, float, float) maxss xmm0, xmm1 minss xmm0, xmm2 ret
Live example: https://godbolt.org/z/5oxvocevK
More information on: https://secret.club/2021/04/09/std-clamp.html