Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

vzeroupper elimination #41508

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR42538
Status NEW
Importance P enhancement
Reported by David Bolvansky (david.bolvansky@gmail.com)
Reported on 2019-07-08 07:41:53 -0700
Last modified on 2019-07-08 09:13:18 -0700
Version trunk
Hardware PC Linux
CC craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also PR27823
#include <immintrin.h>

Since GCC 9+, GCC can eliminate vzeroupper in various cases, e.g.:

long long get_elem2(__m256i v) {
    return v[2];
}

Clang -O3 -march=skylake-avx512
get_elem2:
        vextracti128    xmm0, ymm0, 1
        vmovq   rax, xmm0
        vzeroupper
        ret

GCC  -O3 -march=skylake-avx512
get_elem2:
        vextracti64x2   xmm0, ymm0, 0x1
        vmovq   rax, xmm0
        ret
Quuxplusone commented 5 years ago

But what if caller of get_elem2 used ymm registers and then did a tail call to get_elem2? Those other ymm registers still need their upper bits zeroed, but the caller can’t do it before the call without erasing ymm0’s upper bits. And if it’s a tail call there is no chance after the call.