Closed bowentan closed 2 years ago
This looks like it is a bug in BLIS, see https://bugs.gentoo.org/765805. What is your CPU?
I think you should reopen https://github.com/flame/blis/issues/646 but also give them your CPU info.
By default EasyBuild compiles with -march=native
But I can configure and build successfully without any errors in the build directory manually, given the same configure and make options, by loading GCC/11.2.0, Python/3.9.6 and Perl/5.34.0...
The configure options in your manual build were the same but the environment was not. EasyBuild sets a lot of flags (such as CFLAGS
) which influence the compilation. If you want to see exactly what EasyBuild does add the --trace
option to your eb
command.
You can also dump out a script that will give you that environment with --dump-env-script
I tested with the environment given by --dump-env-script
and it crashed with the -march=x86-64
option. But it succeed by changing x86-64
to native
or haswell
.
However, it would still fail if I added to the easyconfig file the buildopts = 'CFLAGS="-march=native"'
or even the whole value with the change of CFLAGS
from --dump-env-script
, i.e., buildopts = 'CFLAGS="-O2 -ftree-vectorize -march=native -mtune=generic -fno-math-errno"'
.
It could succeed only by adding the following to the easyconfig file.
preconfigopts = 'export CFLAGS="-O2 -ftree-vectorize -march=haswell -mtune=generic -fno-math-errno" && '
Is it because buildopts
cannot overwrite exported environment?
CFLAGS
is an environment variable that is used at configure time, it has no effect at build time. That's why your final version worked.
I don't understand where the -march=x86-64
came from, this is not a valid option. My value for CFLAGS
in a test build I just did is
Environment variable CFLAGS set to -O2 -ftree-vectorize -march=native -fno-math-errno
Can you show me the command line you pass to eb
?
Did you tweak the easyconfig?
Did you try to use the --optarch
?
Yes, I changed the BLIS-0.8.1-GCC-11.2.0.eb file and added the preconfigopts.
The -march=x86-64
might came from my eb option --optarch=GENERIC
?
Is it a big problem with the optarch
option?
Do you actually want to support multiple node types with the same build?
Yes, I want to support 4 nodes with one build.
Are they of different types though? What's their CPU? What instruction set do they support?
--optarch
is tricky as some software ignores it anyway (e.g., see https://github.com/easybuilders/easybuild-easyconfigs/issues/9754). If possible, I would suggest you build on the oldest arch that you have available using default settings. That build will work on newer nodes as well without needing to fiddle with it.
Native builds have a lot less surprises
I give the cpuinfos of the four nodes in the attachment. Archive.zip
I tried to build on the oldest arch, but more problems occurred..
Make sure to remove the tweak to your easyconfig and the --optarch
option when building on the Haswell node.
With my understanding of your suggestion, is it better to build with a specific CPU flag that is shared by all nodes such as avx
passed to --optarch
?
No. My suggestion is that you build all your software on the oldest node type you want to use (which seems to be Haswell) and don't use --optarch
at all. The Haswell builds will work on all the other node types.
EasyBuild will just use -march=native
by default and you wil get a Haswell build that will run on all 4 node types.
Ok, I see... I will remove that. Thanks for your suggestions!
This could have a big performance cost for you though, your newest processor is Ice Lake so you would be wasting a lot of it's capabilities...sometimes it is better to just repeat the builds all the nodes, depends on whether you care about performance or convenience.
One approach, for example, is to set --prefix
based on the CPU type.
For the users you use the same approach to set MODULEPATH
, it is configure based on the cpu arch. For users then they see the same modules regardless of node type, but they get the best performing software.
You should consider joining EasyBuild Slack if you want some advice on these kinds of issues - https://easybuild.io/join-slack
That's helpful! I will do some research about this issue. Thank you very much!
Do we need to keep this issue open (is there something we can change/fix in EasyBuild to mitigate this)?
Oh sorry. This issue can be closed since it was my misunderstanding in optarch
.
System Ubuntu: 20.04
I am currently building BLIS-0.8.1-GCC-11.2.0.eb but the build step couldn't pass with the following error:
I tried to configure and build in the build directory, and that succeed. I ran the configure and make options that were exactly the same as the options in easyconfig, after loading the required GCC/11.2.0, Python/3.9.6 and Perl/5.34.0 modules.
What's wrong with such a situation?