Open Flamefire opened 6 years ago
@Flamefire How was FFTW configured here? In particular, is the --enable-avx2
configure option included by the FFTW easyblock?
It auto-detects which CPU features are available, and thus will automatically use --enable-avx2
unless you include use_avx2 = False
in the FFTW easyconfig file. See also the other related custom easyconfig parameters listed in the output of eb -a -e EB_FFTW
.
As is stands, the FFTW easyblock pretty much ignores the --optarch
setting, unless --optarch=GENERIC
is used, so you're responsible to make sure that FFTW is configured correctly.
Do let us know whether this is helpful.
I'm using EB 3.6, so that easyblock does add --enable_avx2
. The command is something like eb --software=FFTW,3.3.7 --toolchain=gompi,2018a -r --optarch=march=westmere -mtune=haswell
(default easyblock and easyconfig)
This detection based on the current CPU is IMO wrong. First because you could actually enable everything a compiler supports (See http://www.fftw.org/fftw3_doc/Installation-on-Unix.html#Installation-on-Unix : Enable various SIMD instruction sets. You need compiler that supports the given SIMD extensions, but FFTW will try to detect at runtime whether the CPU supports these extensions. That is, you can compile with
--enable-avxand the code will still run on a CPU without AVX support.
) So the detection should be based on the compiler instead.
And second: As in our case the CPU used to compile and the CPU used to run FFTW may not be the same. We have westmere and haswell CPUs mixed in the cluster. So it can be compiled and run on either. Not using AVX2 would loose a lot of speed on the haswells while only gaining a bit space in the compiled library.
I got the feeling, that optarch should not be passed to FFTW at all, as it already compiles different code for different CPU architectures and use the appropriate code at runtime. So everything you could set in optarch is most likely wrong for FFTW.
So I propose the following changes:
use_xxx
an opt-out (require the user to set it to False)Workaround: You can build FFTW separately from the toolchain and set optarch
to empty. This way you currently get the auto-detect behavior, so you also have to make sure, you run on your "best" CPUs. Also don't use "GENERIC" as this will cripple the speed of FFTW.
Afterwards build the rest of the toolchain or your program.
Problem with the workaround: You cannot simply build a module and have its dependencies built automatically if they include FFTW.
@Flamefire Thanks a lot for the suggestions. I think you're right, we're probably being too careful here w.r.t. what we enable, since recent FFTW versions have runtime detection built into the library...
Also blatently ignoring --optarch
makes sense for FFTW specifically as far as I can tell.
Thoughts on this @geimer & @bartoldeman?
I think using optarch for FFTW is ok, as it affects non-hand-optimized parts of FFTW. But I agree that an opt-out for the CPU instructions is appropriate, e.g. for AVX512.
I suppose compiler auto-detect could be optional, as the compiler is known in the easyconfig, unless you use --try-toolchain of course.
Then what about the use case I described: Use generic (cluster-wide) optarch for compiling a whole bunch of software which includes (due to the toolchain used) FFTW.
Passing optarch to FFTW breaks building and one has to build FFTW (and only FFTW) with a different value of optarch.
I've encountered the same problem both on our super and cluster. Both systems are heterogeneous. For foss, we compile for the smallest supported instruction set (set in our optarch).
If I build on a build node with a larger instruction set, I run into this issue for FFTW because of the optarch asks for a smaller instruction set than what is autodetected based on the capabilities of the CPU.
I see two solutions:
Enable various SIMD instruction sets. You need compiler that supports the given SIMD extensions, but FFTW will try to detect at runtime whether the CPU supports these extensions. That is, you can compile with --enable-avx and the code will still run on a CPU without AVX support.
suggest it does. Advantage is that in a heterogeneous system, if you build on a node with the largest instruction set, you get support for all of it. Not sure how the
as it affects non-hand-optimized parts of FFTW.
of bartoldeman fits in though...
-march=core-avx2
. The disadvantage is that in a heterogeneous system, you don't get AVX2 even on those nodes that are capable of it.The current behaviour, where the march set by optarch (e.g. march=sandybridge
) may conflict with the march=core-avx2
set by the easyblock is in any case worse than both of the above solutions, because it simply won't build this way :)
@Flamefire Is this issue still relevant, I've lost track a bit on this...
I don't see any changes to the FFTW EB, so I expect this to still be an issue.
Edit: Just executed eb FFTW-3.3.7-gompi-2018a.eb -r --optarch='march=westmere -mtune=haswell' --rebuild
and got the same error.
We have a cluster with westmere and haswell CPUs so we need a software which runs on westmere as the minimum. Hence we add
--optarch=march=westmere -mtune=haswell
to our easybuild.This works for pretty much all modules but fails when building FFTW as part of the toolchain:
../../../simd-support/simd-avx2.h:43:2: Fehler: #error "compiling simd-avx2.h without avx2 support"
The reason for this is that optarch gets added to late resulting in this:
gcc -DHAVE_CONFIG_H -I. -I../../.. -I ../../.. -march=core-avx2 -mfma -O3 -march=westmere -mtune=haswell -fno-math-errno -fPIC -MT n1fv_25.lo -MD -MP -MF .deps/n1fv_25.Tpo -c n1fv_25.c -fPIC -DPIC -o .libs/n1fv_25.o
So the latter
march
fromoptarch
overrides the intendedmarch=core-avx2
.This was already mentioned before in the mailing list: https://www.mail-archive.com/easybuild@lists.ugent.be/msg03020.html But it seems the cause was not found and hence the issue not solved,
Is there anything you can do to fix this?