AuburnSounds / intel-intrinsics

The Dlang SIMD library
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1
Boost Software License 1.0
68 stars 11 forks source link

Faster set/setr #102

Closed p0nce closed 2 years ago

p0nce commented 2 years ago

Very common pattern.

Fastest on LDC: (only make a difference in -O0)

    {
        __m128 r = void;
        r.ptr[0] = e3; // .ptr doesn't matter here
        r.ptr[1] = e2;
        r.ptr[2] = e1;
        r.ptr[3] = e0;
        return r;
    }

This saves instructions in debug mode. .ptr doesn't change anything in both LDC and GDC, on all optimization levels.

The fun thing is that on GDC the fastest is also the same, and wins a bit in all optimization levels with the = void;

p0nce commented 2 years ago

Marked many places with PERF tag, where a =void; could be beneficial in -O0