FFTW with optarch=march=westmere not working

Flamefire commented 6 years ago

We have a cluster with westmere and haswell CPUs so we need a software which runs on westmere as the minimum. Hence we add --optarch=march=westmere -mtune=haswell to our easybuild.

This works for pretty much all modules but fails when building FFTW as part of the toolchain:
../../../simd-support/simd-avx2.h:43:2: Fehler: #error "compiling simd-avx2.h without avx2 support"

The reason for this is that optarch gets added to late resulting in this:
gcc -DHAVE_CONFIG_H -I. -I../../.. -I ../../.. -march=core-avx2 -mfma -O3 -march=westmere -mtune=haswell -fno-math-errno -fPIC -MT n1fv_25.lo -MD -MP -MF .deps/n1fv_25.Tpo -c n1fv_25.c -fPIC -DPIC -o .libs/n1fv_25.o

So the latter march from optarch overrides the intended march=core-avx2.

This was already mentioned before in the mailing list: https://www.mail-archive.com/easybuild@lists.ugent.be/msg03020.html But it seems the cause was not found and hence the issue not solved,

Is there anything you can do to fix this?

boegel commented 6 years ago

@Flamefire How was FFTW configured here? In particular, is the --enable-avx2 configure option included by the FFTW easyblock? It auto-detects which CPU features are available, and thus will automatically use --enable-avx2 unless you include use_avx2 = False in the FFTW easyconfig file. See also the other related custom easyconfig parameters listed in the output of eb -a -e EB_FFTW.

As is stands, the FFTW easyblock pretty much ignores the --optarch setting, unless --optarch=GENERIC is used, so you're responsible to make sure that FFTW is configured correctly.

Do let us know whether this is helpful.

Flamefire commented 6 years ago

I'm using EB 3.6, so that easyblock does add --enable_avx2. The command is something like eb --software=FFTW,3.3.7 --toolchain=gompi,2018a -r --optarch=march=westmere -mtune=haswell (default easyblock and easyconfig)

This detection based on the current CPU is IMO wrong. First because you could actually enable everything a compiler supports (See http://www.fftw.org/fftw3_doc/Installation-on-Unix.html#Installation-on-Unix : Enable various SIMD instruction sets. You need compiler that supports the given SIMD extensions, but FFTW will try to detect at runtime whether the CPU supports these extensions. That is, you can compile with--enable-avxand the code will still run on a CPU without AVX support.) So the detection should be based on the compiler instead.
And second: As in our case the CPU used to compile and the CPU used to run FFTW may not be the same. We have westmere and haswell CPUs mixed in the cluster. So it can be compiled and run on either. Not using AVX2 would loose a lot of speed on the haswells while only gaining a bit space in the compiled library.

I got the feeling, that optarch should not be passed to FFTW at all, as it already compiles different code for different CPU architectures and use the appropriate code at runtime. So everything you could set in optarch is most likely wrong for FFTW.

So I propose the following changes:

Auto-detect from the compiler which SIMD instructions can be used and enable them by default.
1. Make the use_xxx an opt-out (require the user to set it to False)
2. Make the auto-detect from current CPU an opt-in (require to explicitly set to True) and only disable unsupported SIMD instructions (for the case where a user knows, that FFTW will only be run in the CPU used to compile the code)
Disable optarch for FFTW to avoid overriding the used architecture-specific compilation flags from FFTW itself.

Workaround: You can build FFTW separately from the toolchain and set optarch to empty. This way you currently get the auto-detect behavior, so you also have to make sure, you run on your "best" CPUs. Also don't use "GENERIC" as this will cripple the speed of FFTW.
Afterwards build the rest of the toolchain or your program.

Problem with the workaround: You cannot simply build a module and have its dependencies built automatically if they include FFTW.

boegel commented 6 years ago

@Flamefire Thanks a lot for the suggestions. I think you're right, we're probably being too careful here w.r.t. what we enable, since recent FFTW versions have runtime detection built into the library...

Also blatently ignoring --optarch makes sense for FFTW specifically as far as I can tell.

Thoughts on this @geimer & @bartoldeman?

bartoldeman commented 6 years ago

I think using optarch for FFTW is ok, as it affects non-hand-optimized parts of FFTW. But I agree that an opt-out for the CPU instructions is appropriate, e.g. for AVX512.

I suppose compiler auto-detect could be optional, as the compiler is known in the easyconfig, unless you use --try-toolchain of course.

Flamefire commented 6 years ago

Then what about the use case I described: Use generic (cluster-wide) optarch for compiling a whole bunch of software which includes (due to the toolchain used) FFTW.

Passing optarch to FFTW breaks building and one has to build FFTW (and only FFTW) with a different value of optarch.

casparvl commented 6 years ago

I've encountered the same problem both on our super and cluster. Both systems are heterogeneous. For foss, we compile for the smallest supported instruction set (set in our optarch).

If I build on a build node with a larger instruction set, I run into this issue for FFTW because of the optarch asks for a smaller instruction set than what is autodetected based on the capabilities of the CPU.

I see two solutions:

Ignore optarch and simply build FFTW for the full instruction set as supported by the CPU in the build node. I don't know enough about FFTW to know if this code will then run on nodes with a smaller instruction set, but Flamefire's quote from the fftw web page.

Enable various SIMD instruction sets. You need compiler that supports the given SIMD extensions, but FFTW will try to detect at runtime whether the CPU supports these extensions. That is, you can compile with --enable-avx and the code will still run on a CPU without AVX support.

suggest it does. Advantage is that in a heterogeneous system, if you build on a node with the largest instruction set, you get support for all of it. Not sure how the

as it affects non-hand-optimized parts of FFTW.

of bartoldeman fits in though...

Take the largest common denominator of optarch and the autodetected CPU instruction set. I.e. if the CPU instruction set supports AVX and AVX2, but optarch asks for an architecture that only supports AVX, only build AVX. I.e. make sure that the EasyBlock doesn't set -march=core-avx2. The disadvantage is that in a heterogeneous system, you don't get AVX2 even on those nodes that are capable of it.

The current behaviour, where the march set by optarch (e.g. march=sandybridge) may conflict with the march=core-avx2 set by the easyblock is in any case worse than both of the above solutions, because it simply won't build this way :)

boegel commented 4 years ago

@Flamefire Is this issue still relevant, I've lost track a bit on this...

Flamefire commented 4 years ago

I don't see any changes to the FFTW EB, so I expect this to still be an issue.

Edit: Just executed eb FFTW-3.3.7-gompi-2018a.eb -r --optarch='march=westmere -mtune=haswell' --rebuild and got the same error.

easybuilders / easybuild-easyblocks

FFTW with optarch=march=westmere not working #1418