Open p0nce opened 1 day ago
I couldn't agree less with AVX512 being effectively useless and I fail to see why having more intrinsics and constants is a big deal. I personally would agree with 512-bit vectors being poor to implement as generally they aren't particularly useful, but the intrinsics implemented are optimized to be efficient broadly regardless of hardware and would it not be better to support a fuller range of Intel Intrinsics functions rather than narrower? As time passes more and more software has the capability to utilize AVX512 and I personally implemented those functions after having a need for them, thus standardization of a most efficient path across devices seems reasonable.
Would it be ok with it being a separate package?
My main concern is the amount of symbols and code to build in the obj file. The expectation that any AVX-512 intrinsics should be blessed is a bit much future work, considering there are so many AVX-512 intrinsics (though of course some of them have interesting semantics).
I'd rather make them a separate subpackage and make it so that you can use sufficient intel-intrinsics internals to get there. But personnally I won't really have a need for that, and I'm surprised anyone uses AVX-512 at all.
Debug build times are affected by intel-intrinsics very much if not careful
Proposal:
intel-intrinsics:avx512
subpackage
intel-intrinsics
types
and internals
still know about 512-bit vectors in the normal packageWhat do you think?
AVX2 =>192 intrinsics
while AVX-512 => ~5500 intrinsics considering AVX2 effort in intel-intrinsics
has begun in Feb2022, it's safe to say it will never get completed, at this point x86 will be a thing of the past.
x86 is certainly not going to be replaced by ARM or other architectures for quite a while and the number of processors with AVX512 only continues to grow. Personally I would say modularizing the tests to avoid having to build everything upon changes would be a good idea, personally I would just exclude different files with different build configurations but I don't think the exact implementation necessarily matters a whole lot. Logically outside of internal testing build time shouldn't be a big problem and with internal testing either modularizing the configurations or making independent packages would both work.
I might implement modular unittesting.
__m128i _mm_srlv_epi32(__m128i a, __m128i b) pure @trusted
{
static if (GDC_with_AVX2 || LDC_with_AVX2)
return cast(__m128i)__builtin_ia32_psrlv4si(cast(byte16)a, cast(byte16)b);
else
{
return _mm_setr_epi32(
a[0] >> b[0],
a[1] >> b[1],
a[2] >> b[2],
a[3] >> b[3]
);
}
}
>>
makes a shifted sign here, you can use >>>
operator for unsigned shift (or cast left operand)
That is instead _mm_srav_epi32
New proposal: a configuration option, that enable AVX512 under a version. You would select that with subConfigurations
.
(It can create diamond problem for people using AVX512 in two different dependencies but I believe one day DUB or redub will have to handle this sort of things.)
avx2 From cet ~master branch
_mm256_blendv_epi8
=> took GDC path, but not the unittest or other paths which are the same as existing_mm_srlv_epi64
_mm_srlv_epi32
=> for safe of semantic, must find fastest of just relying on compiler vs simd masking_mm_sllv_epi64
_mm_sllv_epi32
_mm256_shuffle_epi8
_mm256_shuffle_epi32
_mm256_shufflelo_epi16
_mm256_shufflehi_epi16
_mm256_permute4x64_epi64
_mm256_bslli_epi128
=> no builtin work indeed, keep the asm, if it create issues would need shufflevector_mm256_bsrli_epi128
=> ditto_mm256_srli_si256
, just an alias_mm256_slli_si256
, just an alias In some cases builtins (end of the file) aren't used but inline asm is, which is less portable.Then there is the whole avx512 shenanigans and I don't like that, it adds tons of constants, metric ton of intrinsics and AVX-512 is useless for consumer software. Real question as AVX2 is already taking years of efforts in ìntel-intrinsics` to get into shape. However like with AVX2 perhaps just the semantics are a win even without the instructions.