easybuilders / easybuild

EasyBuild - building software with ease
http://easybuild.io
GNU General Public License v2.0
461 stars 143 forks source link

BLIS-0.8.1-GCC-11.2.0.eb: ‘_mm256_fmadd_ps’: target specific option mismatch #814

Closed bowentan closed 2 years ago

bowentan commented 2 years ago

System Ubuntu: 20.04

I am currently building BLIS-0.8.1-GCC-11.2.0.eb but the build step couldn't pass with the following error:

/data1/apps/software/GCCcore/11.2.0/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include/fmaintrin.h:63:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_ps’: target specific option mismatch

I tried to configure and build in the build directory, and that succeed. I ran the configure and make options that were exactly the same as the options in easyconfig, after loading the required GCC/11.2.0, Python/3.9.6 and Perl/5.34.0 modules.

What's wrong with such a situation?

ocaisa commented 2 years ago

This looks like it is a bug in BLIS, see https://bugs.gentoo.org/765805. What is your CPU?

I think you should reopen https://github.com/flame/blis/issues/646 but also give them your CPU info.

ocaisa commented 2 years ago

By default EasyBuild compiles with -march=native

bowentan commented 2 years ago

But I can configure and build successfully without any errors in the build directory manually, given the same configure and make options, by loading GCC/11.2.0, Python/3.9.6 and Perl/5.34.0...

ocaisa commented 2 years ago

The configure options in your manual build were the same but the environment was not. EasyBuild sets a lot of flags (such as CFLAGS) which influence the compilation. If you want to see exactly what EasyBuild does add the --trace option to your eb command.

ocaisa commented 2 years ago

You can also dump out a script that will give you that environment with --dump-env-script

bowentan commented 2 years ago

I tested with the environment given by --dump-env-script and it crashed with the -march=x86-64 option. But it succeed by changing x86-64 to native or haswell.

However, it would still fail if I added to the easyconfig file the buildopts = 'CFLAGS="-march=native"' or even the whole value with the change of CFLAGS from --dump-env-script, i.e., buildopts = 'CFLAGS="-O2 -ftree-vectorize -march=native -mtune=generic -fno-math-errno"'.

It could succeed only by adding the following to the easyconfig file.

preconfigopts = 'export CFLAGS="-O2 -ftree-vectorize -march=haswell -mtune=generic -fno-math-errno" && '

Is it because buildopts cannot overwrite exported environment?

ocaisa commented 2 years ago

CFLAGS is an environment variable that is used at configure time, it has no effect at build time. That's why your final version worked.

I don't understand where the -march=x86-64 came from, this is not a valid option. My value for CFLAGS in a test build I just did is

Environment variable CFLAGS set to -O2 -ftree-vectorize -march=native -fno-math-errno

Can you show me the command line you pass to eb?

ocaisa commented 2 years ago

Did you tweak the easyconfig?

ocaisa commented 2 years ago

Did you try to use the --optarch?

bowentan commented 2 years ago

Yes, I changed the BLIS-0.8.1-GCC-11.2.0.eb file and added the preconfigopts.

The -march=x86-64 might came from my eb option --optarch=GENERIC?

bowentan commented 2 years ago

Is it a big problem with the optarch option?

ocaisa commented 2 years ago

Do you actually want to support multiple node types with the same build?

bowentan commented 2 years ago

Yes, I want to support 4 nodes with one build.

ocaisa commented 2 years ago

Are they of different types though? What's their CPU? What instruction set do they support?

ocaisa commented 2 years ago

--optarch is tricky as some software ignores it anyway (e.g., see https://github.com/easybuilders/easybuild-easyconfigs/issues/9754). If possible, I would suggest you build on the oldest arch that you have available using default settings. That build will work on newer nodes as well without needing to fiddle with it.

ocaisa commented 2 years ago

Native builds have a lot less surprises

bowentan commented 2 years ago

I give the cpuinfos of the four nodes in the attachment. Archive.zip

I tried to build on the oldest arch, but more problems occurred..

ocaisa commented 2 years ago

Make sure to remove the tweak to your easyconfig and the --optarch option when building on the Haswell node.

bowentan commented 2 years ago

With my understanding of your suggestion, is it better to build with a specific CPU flag that is shared by all nodes such as avx passed to --optarch ?

ocaisa commented 2 years ago

No. My suggestion is that you build all your software on the oldest node type you want to use (which seems to be Haswell) and don't use --optarch at all. The Haswell builds will work on all the other node types.

ocaisa commented 2 years ago

EasyBuild will just use -march=native by default and you wil get a Haswell build that will run on all 4 node types.

bowentan commented 2 years ago

Ok, I see... I will remove that. Thanks for your suggestions!

ocaisa commented 2 years ago

This could have a big performance cost for you though, your newest processor is Ice Lake so you would be wasting a lot of it's capabilities...sometimes it is better to just repeat the builds all the nodes, depends on whether you care about performance or convenience.

ocaisa commented 2 years ago

One approach, for example, is to set --prefix based on the CPU type.

For the users you use the same approach to set MODULEPATH, it is configure based on the cpu arch. For users then they see the same modules regardless of node type, but they get the best performing software.

ocaisa commented 2 years ago

You should consider joining EasyBuild Slack if you want some advice on these kinds of issues - https://easybuild.io/join-slack

bowentan commented 2 years ago

That's helpful! I will do some research about this issue. Thank you very much!

boegel commented 2 years ago

Do we need to keep this issue open (is there something we can change/fix in EasyBuild to mitigate this)?

bowentan commented 2 years ago

Oh sorry. This issue can be closed since it was my misunderstanding in optarch.