AuburnSounds / intel-intrinsics

The Dlang SIMD library
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1
Boost Software License 1.0
67 stars 11 forks source link

Different speed results on macOS(M1) and Linux(X86_64) #121

Closed cyrusmsk closed 1 year ago

cyrusmsk commented 1 year ago

I wrote several implementations of the same problem (actually it were ports from other languages).

zero - simple code without usage of SIMD first - implementation with SIMD second - another implementation with SIMD

For building I'm using simple 'dub build --build=release --compiler=ldc2'. In dub.json I have intel-intrinsics dependency, and for x86_64 I've added "dflags-ldc": ["-mattr=+sse3,+ssse3,+sse4.1,+sse4.2,+avx", "-mcpu=native"]. For M1 (arm) no extra flags are used. Currently I'm interested only in LDC compiler.

So results on my MBP (M1-Pro): /usr/bin/time ./zero 10 73196 Pfannkuchen(10) = 38 0,31 real 0,31 user 0,00 sys /usr/bin/time ./first 10 73196 Pfannkuchen(10) = 38 0,13 real 0,13 user 0,00 sys /usr/bin/time ./second 10 73196 Pfannkuchen(10) = 38 0,37 real 0,37 user 0,00 sys Second version is little bit slower than zero, but the first is much faster. However when I'm running on GitHub Actions x86_64 Linux machine the zero version is much faster even of the first version. GitHub Actions job details

Is it something wrong with my configuration of dub?

p0nce commented 1 year ago

Most likely issues:

Remember that you can profile your code in AMD uProf or Intel Vtune and -b release-debug If you find an actual performance issue it would help to have a reproducible example, in the current case I have nothing to work on...

p0nce commented 1 year ago

Oh I've found your repo, let's see... zero slower than 2nd slower than 1st

cyrusmsk commented 1 year ago

Thank you for your answer. It seems situation is good now Maybe it is better to add somewhere on dub's package page or GitHub README file information about '--combined' flag

p0nce commented 1 year ago

Absolutely, I will add this.