AuburnSounds / intel-intrinsics

The Dlang SIMD library
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1
Boost Software License 1.0
68 stars 11 forks source link

big merge tasks remaining #145

Open p0nce opened 1 day ago

p0nce commented 1 day ago

avx2 From cet ~master branch

Then there is the whole avx512 shenanigans and I don't like that, it adds tons of constants, metric ton of intrinsics and AVX-512 is useless for consumer software. Real question as AVX2 is already taking years of efforts in ìntel-intrinsics` to get into shape. However like with AVX2 perhaps just the semantics are a win even without the instructions.

cetio commented 1 day ago

I couldn't agree less with AVX512 being effectively useless and I fail to see why having more intrinsics and constants is a big deal. I personally would agree with 512-bit vectors being poor to implement as generally they aren't particularly useful, but the intrinsics implemented are optimized to be efficient broadly regardless of hardware and would it not be better to support a fuller range of Intel Intrinsics functions rather than narrower? As time passes more and more software has the capability to utilize AVX512 and I personally implemented those functions after having a need for them, thus standardization of a most efficient path across devices seems reasonable.

p0nce commented 1 day ago

Would it be ok with it being a separate package?

My main concern is the amount of symbols and code to build in the obj file. The expectation that any AVX-512 intrinsics should be blessed is a bit much future work, considering there are so many AVX-512 intrinsics (though of course some of them have interesting semantics).

I'd rather make them a separate subpackage and make it so that you can use sufficient intel-intrinsics internals to get there. But personnally I won't really have a need for that, and I'm surprised anyone uses AVX-512 at all.

p0nce commented 1 day ago

Debug build times are affected by intel-intrinsics very much if not careful image

p0nce commented 1 day ago

Proposal: intel-intrinsics:avx512 subpackage

What do you think?

AVX2 =>192 intrinsics while AVX-512 => ~5500 intrinsics considering AVX2 effort in intel-intrinsics has begun in Feb2022, it's safe to say it will never get completed, at this point x86 will be a thing of the past.

cetio commented 1 day ago

x86 is certainly not going to be replaced by ARM or other architectures for quite a while and the number of processors with AVX512 only continues to grow. Personally I would say modularizing the tests to avoid having to build everything upon changes would be a good idea, personally I would just exclude different files with different build configurations but I don't think the exact implementation necessarily matters a whole lot. Logically outside of internal testing build time shouldn't be a big problem and with internal testing either modularizing the configurations or making independent packages would both work.

I might implement modular unittesting.

p0nce commented 15 hours ago
__m128i _mm_srlv_epi32(__m128i a, __m128i b) pure @trusted
{
    static if (GDC_with_AVX2 || LDC_with_AVX2)
        return cast(__m128i)__builtin_ia32_psrlv4si(cast(byte16)a, cast(byte16)b);
    else
    {
        return _mm_setr_epi32(
            a[0] >> b[0],
            a[1] >> b[1],
            a[2] >> b[2],
            a[3] >> b[3]
        );
    }
}

>> makes a shifted sign here, you can use >>> operator for unsigned shift (or cast left operand) That is instead _mm_srav_epi32

p0nce commented 15 hours ago

New proposal: a configuration option, that enable AVX512 under a version. You would select that with subConfigurations.

(It can create diamond problem for people using AVX512 in two different dependencies but I believe one day DUB or redub will have to handle this sort of things.)