dcsale / ppml

3 stars 0 forks source link

compiler flags for Xeon CPUs #1

Open dcsale opened 9 years ago

dcsale commented 9 years ago

supposedly an additional compiler flag can give a nice speedup on Xeon 2630 v3 "Sandy Bridge" processors .... these CPUs have new instuction set called AVX. Try the GNU compiler flag "-march=corei7-avx"

More info here: http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/i386-and-x86_002d64-Options.html and https://www.microway.com/hpc-tech-tips/achieve-the-best-performance-intel-xeon-e5-2600-sandy-bridge/

dcsale commented 9 years ago

Intel(R) Xeon(R) CPU E5-2680 v3

check using command:

g++ -march=corei7-avx -dM -E -x c /dev/null | grep -i -e avx -e fma

and the output is a new pre-proccesor variable:

define AVX 1

but letting compiler choose the native architecture defines additional variables:

g++ -march=native -dM -E -x c /dev/null | grep -i -e avx -e fma

define core_avx2 1

define AVX 1

define **FP_FAST_FMAF 1

define __FMA** 1

define AVX2 1

define __core_avx2 1

define __FP_FAST_FMA 1

dcsale commented 9 years ago

can also use the mpirun parameters, for potentially significant speedup, for example:

mpirun -np 16 --bind-to-core --bysocket ./Naga CTRLstl-NACA4415-aoa0-Mesh\=finest 2>&1 | tee log.Naga-airfoil

more info here: http://www.pugetsystems.com/blog/2014/08/05/OpenFOAM-performance-on-Quad-socket-Xeon-and-Opteron-587/

dcsale commented 9 years ago

suggest to only use --bind-to-socket

further reading explain that bind to sockets, rather than cores, can increase performance because processes don't bounce around between cores and L2 caches. further reading here: http://blogs.cisco.com/performance/sockets-cores-and-hyperthreads-oh-my