AVX with intrinsic functions

etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.

http://www.itkp.uni-bonn.de/~urbach/software.html

GNU General Public License v3.0

32 stars 47 forks source link

AVX with intrinsic functions #137

Open urbach opened 12 years ago

urbach commented 12 years ago

would be great to have the Dirac operator with AVX intrinsic functions...

do both, gcc and icc understand the same syntax?

kostrzewa commented 12 years ago

They do, they use the same include file

#include <immintrin.h>

and the "introduction to intel avx" from the Intel website explicitly states that all three compilers (icc,gcc and visual studio) use the same syntax.

kostrzewa commented 12 years ago

Relevant data types and operations:

_m256d	4 doubles, representing one of 16 YMM registers
_mm256_op_[ps,pd](...)	most intrinsic functions follow this format, ps/pd for 8 single / 4 double

urbach commented 12 years ago

so, are we talking about the same intrinsics, e.g.

_mm256 _mm256_fmadd_pd(a,b,c)

? Well, I guess I need to try. In the intel docu (which is really really crappy, I have to say) I understood it differently.

kostrzewa commented 12 years ago

so, are we talking about the same intrinsics, e.g.?

Yes, absolutely. Although I'm not sure gcc implements all of them...

If you download the Intel Intrinsics Guide application (Linux,Win,Mac), it has a comprehensive reference of all intrinsics available. It might have been old documentation that you happened upon.

http://software.intel.com/sites/default/files/11MIC15_Intrinsics-Guide-for-Intel-AVX2-Linux__2_.zip

I've read from multiple sources now that both gcc and icc use exactly the same syntax for all SIMD intrinsics.

kostrzewa commented 12 years ago

See: /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/include/immintrin.h points to /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/include/avxintrin.h

urbach commented 12 years ago

I searched a very long time for a downloadable version, but never found one. For the link you send one needs to install something, I need a PDF, not some Java or whatever crap. Found that before and didn't look at it further (okay, so much about that ;-) ). Maybe I have to...

Don't have gcc-4.5 available here...

kostrzewa commented 12 years ago

Well, yeah, it's a java app with a search function... not completely useless but I agree.

How about this: http://software.intel.com/sites/default/files/m/d/4/1/d/8/319433-011.pdf not the intrinsics but ...

urbach commented 12 years ago

okay, so this Java thingy is at least somehing.

The PDF I've seen also already... I'll certainly read it in my next live... ;-)

urbach commented 12 years ago

what I found most useful (but its badly written, I think) is

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/index.htm

urbach commented 12 years ago

so, on my laptop I have gcc-4.6... I cannot find for instance

_mm256_fmadd

in any of the headers in x86_64-linux-gnu/4.6/include. There are only the gcc intrinsics, which are

__mm256d __builtin_ia32_vfmaddpd256

in the header fma4intrin.h.

urbach commented 12 years ago

Hmm, seems that the fma intrinsics are not yet supported in intel style by gcc. Otherwise supports gcc the intel intrinsics, but icc not the gcc intrinsics, of course... The processors on SuperMUC do not support more than AVX, I think not even AVX2. The fat nodes even only SSE4.

So, not sure what to do, the fused multiply-add operations would be quite important...

urbach commented 12 years ago

ah, see

https://github.com/etmc/tmLQCD/wiki/AVX-Intrinsics

for more on intrinsics...

kostrzewa commented 12 years ago

In the Intel Java app FMA is a special category and it is not supported on my Core Duo desktop, for instance. I haven't checked on my Core2 laptop but before using the fmadd intrinsics one should really check if they're supported at all.

urbach commented 12 years ago

yes, I think the Intel processor supporting FMA does not exist yet... :(