Closed cyrusmsk closed 1 year ago
Most likely issues:
dub --combined
, else most of the intrinsics won't get inlined.-b release-nobounds
? more speed-a arm64-apple-macos
else it's JITed. But the timings will be good under Rosetta tooRemember that you can profile your code in AMD uProf or Intel Vtune and -b release-debug
If you find an actual performance issue it would help to have a reproducible example, in the current case I have nothing to work on...
Oh I've found your repo, let's see... zero slower than 2nd slower than 1st
Thank you for your answer. It seems situation is good now Maybe it is better to add somewhere on dub's package page or GitHub README file information about '--combined' flag
Absolutely, I will add this.
I wrote several implementations of the same problem (actually it were ports from other languages).
zero - simple code without usage of SIMD first - implementation with SIMD second - another implementation with SIMD
For building I'm using simple 'dub build --build=release --compiler=ldc2'. In dub.json I have intel-intrinsics dependency, and for x86_64 I've added "dflags-ldc": ["-mattr=+sse3,+ssse3,+sse4.1,+sse4.2,+avx", "-mcpu=native"]. For M1 (arm) no extra flags are used. Currently I'm interested only in LDC compiler.
So results on my MBP (M1-Pro):
/usr/bin/time ./zero 10 73196 Pfannkuchen(10) = 38 0,31 real 0,31 user 0,00 sys /usr/bin/time ./first 10 73196 Pfannkuchen(10) = 38 0,13 real 0,13 user 0,00 sys /usr/bin/time ./second 10 73196 Pfannkuchen(10) = 38 0,37 real 0,37 user 0,00 sys
Second version is little bit slower than zero, but the first is much faster. However when I'm running on GitHub Actions x86_64 Linux machine the zero version is much faster even of the first version. GitHub Actions job detailsIs it something wrong with my configuration of dub?