dcsale / SOWFA

NREL's Simulator for Offshore Wind Farm Applications
http://wind.nrel.gov/designcodes/simulators/sowfa/
Other
10 stars 1 forks source link

compiler flags for Xeon cpus #2

Open dcsale opened 9 years ago

dcsale commented 9 years ago

supposedly an additional compiler flag can give a nice speedup on Xeon 2630 v3 "Sandy Bridge" processors .... these CPUs have new instuction set called AVX. Try the GNU compiler flag "-march=corei7-avx"

More info here: http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/i386-and-x86_002d64-Options.html and https://www.microway.com/hpc-tech-tips/achieve-the-best-performance-intel-xeon-e5-2600-sandy-bridge/

dcsale commented 9 years ago

try adding compiler flags for OpenFOAM in: /wmake/rules/linux64Gcc/c++Opt and /wmake/rules/linux64Gcc/cOpt

dcsale commented 9 years ago

more info about how to enable AVX here: http://stackoverflow.com/questions/943755/gcc-optimization-flags-for-xeon To enable AVX, the short of it is:

-march=corei7-avx for GCC < 4.9.0 or -march=sandybridge for GCC >= 4.9.0

The following will show you all the flags your processor supports:

cat /proc/cpuinfo | grep flags | head -1

or

gcc -march=native -mfpmath=sse -O2 -Q --help=target -v

GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed. So try in combination these flags:

gcc -march=corei7-avx -mtune
dcsale commented 9 years ago

further further explanation at: http://stackoverflow.com/questions/10559275/gcc-how-is-march-different-from-mtune

it seems to me that this should provide greatest compatibility with different CPUs (at least intels)

-march=corei7-avx -mtune=generic
dcsale commented 9 years ago

I tested OpenFOAM v2.4.x compiled with and without the AVX instructions (on Intel Xeon 2630 v3 "Sandy Bridge"). I ran a few trials of the pisoFoamTurbine solver for 80 iterations. The compiler flags were passed to OpenFOAM and also the FAST Fortran code. Here are the elapsed wall-times:

compiler flags time [s] avg. time [s]
none [244, 281, 333, 308, 261, 316] 291
-march=corei7-avx -mtune=generic [372, 270, 279, 241] 291
-march=core-avx2 [244, 283, 271] 266

The flag -march=core-avx2 (note -march=X implies -mtune=X) seems to provide most aggresive optimization, and potential for nearly 10% speedup. Nothing else was running on the computer so wonder why such a large variability between different runs? Anyways, perhaps there is something beneficial about AVX2, so I will leave the flag enabled.

dcsale commented 9 years ago

next should experiment with flags for mpirun. Namely the bind to socket, or bind to core options ...