Open apbryan opened 1 month ago
Hello, your unittest seems to pass here, what's your LDC version? (EDIT: and OS?)
1.11.20 now implements _mm256_srli_epi64
, please dub upgrade
I think you got trapped by an old LDC promoting your int 32
into a int4
equal to [32, 32, 32, 32]
, that instruction is a trap because of this, it would then asks a shift of (32 << 32) + 32
bits. That's why we advise to use _mm256_srli_epi64
instead of _mm256_srl_epi64
THAT SAID, it seems newer LDC prevent such implicit conversions. As said, I don't repro your unittest, which is odd.
Thanks for responding and implementing _mm256_srli_epi64()!
Looking at the output more, I realized while my application was being compiled with -mattr=+avx2, intel-intrinsics was not. After changing:
dependency "intel-intrinsics" version="~>1.0"
to
dependency "intel-intrinsics" version="~>1.0" {
dflags "-mattr=+avx2" "-O3"
}
and also replacing _mm256_srl_epi64() with _mm256_srli_epi64(), my unittest now passes; though it does not pass when intel-intrinsics is not built with -mattr=+avx2
This is reproducible by me on debian 12 bookworm with both the repo version of LDC v1.30.0 and a freshly compiled v1.39.0
I still can't repro on Windows or godbolt, I'm going to left it here.
Probably you should be able to comment __builtin_ia32_psrlq128
in _mm_srl_epi64
to get it to work, let me know.
Probably need your LLVM version with ldc2 --version
too.
The following code fails on my machine when compiling (ldc) with -mattr=+avx2
When NOT building with -mattr=+avx2:
When building with -mattr=+avx2:
CPU information:
If _mm256_srli_epi64() was implemented I would just use that instead :)