easybuilders / easybuild

EasyBuild - building software with ease
http://easybuild.io
GNU General Public License v2.0
464 stars 143 forks source link

zlib-1.2.11 Compiler error reporting is too harsh for configure #761

Closed kelbstf closed 2 years ago

kelbstf commented 2 years ago

Hello,

i almost completed cross compiling

eb --debug foss-2020a.eb --optarch=x86-rome --robot

but the dependency zlib fails like this:

== processing EasyBuild easyconfig /opt/ohpc/admin/easybuild/eb20211109/easybuild/easyconfigs/z/zlib/zlib-1.2.11-GCCcore-9.3.0.eb\
== building and installing Compiler/GCCcore/9.3.0/lib/zlib/1.2.11...

== FAILED: Installation ended unsuccessfully (build directory: \
/opt/ohpc/admin/easybuild/eb20211109/easybuild/build/zlib/1.2.11/GCCcore-9.3.0): build failed (first 300 chars): \
cmd " ./configure --prefix=/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0 " exited with exit code 1 and output:\
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).
** ./configure aborting.
 (took 0 secs)\
== Results of the build can be found in the log file(s) /tmp/eb-26mi0y5n/easybuild-zlib-1.2.11-20211125.212137.domWo.log\
ERROR: Build of /opt/ohpc/admin/easybuild/eb20211109/easybuild/easyconfigs/z/zlib/zlib-1.2.11-GCCcore-9.3.0.eb failed (err: 'build failed (first 300 chars): cmd " ./configure --prefix=/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0 " exited with exit code 1 and output:\nCompiler error reporting is too harsh for ./configure (perhaps remove -Werror).\n** ./configure aborting.\n')

From "/tmp/eb-26mi0y5n/easybuild-zlib-1.2.11-20211125.212137.domWo.log":

== 2021-11-25 21:21:38,392 easyblock.py:3915 \
WARNING build failed (first 300 chars): \
cmd " ./configure --prefix=/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0 "\
exited with exit code 1 and output:\
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).
** ./configure aborting.

Is it possible that this is due to the cross compiling situation? I am using the haswell based OS-level "9.3.0 (GCC)" for this compilation...

Would really appreciate any hint how to tackle such an ERROR with components of a toolchain.

boegel commented 2 years ago

Hmm, never seen this error before...

Can you provide some more information about the system on which you're seeing this? Can you share the output of eb --show-system-info?

ocaisa commented 2 years ago

https://stackoverflow.com/questions/27867271/zlib-harsh-compiler-warnings-and-configure-test says this can be because you don't have a working compiler

kelbstf commented 2 years ago

Hello @boegel & @ocaisa,

thank you for your help. I also found this to be a surprisingly rare error message. Seem like i hit a genuine border case. Should start collecting those ;-)

The situation:

System specs:

bash-4.4$ eb --show-system-info
System information (master):

* OS:
  -> name: centos linux
  -> type: Linux
  -> version: 8.4.2105
  -> platform name: x86_64-unknown-linux

* CPU:
  -> vendor: Intel
  -> architecture: x86_64
  -> family: Intel
  -> arch name: UNKNOWN (archspec is not installed?)
  -> model: Intel Core Processor (Haswell, no TSX)
  -> speed: 2599.996
  -> cores: 8
  -> features: abm,aes,apic,arat,avx,avx2,bmi1,bmi2,clflush,cmov,constant_tsc,cpuid,cpuid_fault,cx16,cx8,de,erms,f16c,fma,fpu,fsgsbase,fxsr,hypervisor,ibpb,ibrs,invpcid,invpcid_single,lahf_lm,lm,mca,mce,md_clear,mmx,movbe,msr,mtrr,nopl,nx,pae,pat,pcid,pclmulqdq,pge,pni,popcnt,pse,pse36,pti,rdrand,rdtscp,rep_good,sep,smep,ssbd,sse,sse2,sse4_1,sse4_2,ssse3,syscall,tsc,tsc_deadline_timer,tsc_known_freq,vme,x2apic,xsave,xsaveopt,xtopology

* software:
  -> glibc version: 2.28
  -> Python binary: /opt/ohpc/admin/easybuild/bin/python3
  -> Python version: 3.6.8

GCC:

bash-4.4$ which gcc
/opt/ohpc/pub/compiler/gcc/9.3.0/bin/gcc

-bash-4.4$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/ohpc/pub/compiler/gcc/9.3.0/libexec/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --disable-multilib --enable-languages=c,c++,fortran --prefix=/opt/ohpc/pub/compiler/gcc/9.3.0 --disable-static --enable-shared
Thread model: posix
gcc version 9.3.0 (GCC)

@ocaisa: i also considered this option, but then believed it would not match this situation, since all the other components of the toolchain seemed to have done fine, it seems just to hit zlib. Perhaps zlib applies a different configuration?

My next stop would be "eb --debug foss-2020a.eb --optarch=haswell --robot", which i need anyways...

ocaisa commented 2 years ago

Where did you find x86-rome as a value for opt-arch? I think that's only for Cray compilers...are you on a Cray system? I think that should be znver2 for GCC (but I had to do quite a bit of googling to find that).

Either way you are going to run into a LOT of issues if you try to cross-compile everything, since any sanity check commands will likely not execute correctly. You could compile some things generically on the login (for example GCCcore and anything at that level) so that they will work on both the login and compute, but once you get to performance critical software you will save yourself a lot of pain if you build on the compute nodes with default settings.

ocaisa commented 2 years ago

To do a generic compilation for anything at GCCcore you can use

--optarch="GCCcore:march=haswell -mtune=haswell"

in your EB settings. With this setting you can build everything on the compute nodes, and anything at GCCcore will still run on the login nodes.

kelbstf commented 2 years ago

Oh that's interesting. I also was unsure about the proper parameter name for specifying target architectures. I ended up with assuming that "znver2" would be what gcc itself is using, while EasyBuild might use "x86-rome", based from what i found here "https://easybuilders.github.io/easybuild-tutorial/2021-lust/cray/custom_toolchains/" - which, as you said, is actually meant for Cray, but i simply gave it a shot.

So far i shied away from using the compute nodes for compilation (which is abolutely preferred on the long), as i wanted to postpone the EASYBUILD Slurm integration to a better timewindow due to time pressure. But reading your recommendation, i'll definitely think twice.

If both aspects of the above might not contribute to/cause this zlib problem, i just wanted to add the error log from my "eb --debug foss-2020a.eb --optarch=haswell --robot", which had the same issue with zlib:

/tmp/eb-cwfhxult/easybuild-zlib-1.2.11-20211126.115625.Djumm.log:

== 2021-11-26 11:56:26,069 modules.py:608 DEBUG Result for existence check of GCCcore/9.3.0 based on 'module show' output line '   /opt/ohpc/pub/moduleseb/znver2/all/Core/compiler/GCCcore/9.3.0.lua:': True
== 2021-11-26 11:56:26,069 modules.py:654 INFO Result for existence check of GCCcore/9.3.0 module: True

== 2021-11-26 11:56:26,070 toolchain.py:760 INFO List of toolchain dependency modules and toolchain definition match!
== 2021-11-26 11:56:26,071 compiler.py:361 INFO _set_optimal_architecture: using haswell as optarch for x86_64.

== 2021-11-26 11:56:26,073 toolchain.py:426 DEBUG get_software_root software root /opt/ohpc/pub/appseb/znver2/binutils/2.34 for binutils was found in environment
== 2021-11-26 11:56:26,074 environment.py:91 INFO Environment variable CC set to gcc (previously undefined)
== 2021-11-26 11:56:26,075 environment.py:91 INFO Environment variable EBVARCC set to gcc (previously undefined)

== 2021-11-26 11:56:26,075 toolchain.py:1117 DEBUG _setenv_variables: setting environment variable CXX to g++
== 2021-11-26 11:56:26,075 environment.py:91 INFO Environment variable CXX set to g++ (previously undefined)
== 2021-11-26 11:56:26,075 environment.py:91 INFO Environment variable EBVARCXX set to g++ (previously undefined)

== 2021-11-26 11:56:26,078 easyblock.py:1885 DEBUG Changed to real build directory /opt/ohpc/admin/easybuild/easybuild/build/zlib/1.2.11/GCCcore-9.3.0/zlib-1.2.11/ (start_dir)

== 2021-11-26 11:56:26,078 easyblock.py:3598 INFO Starting configure step
== 2021-11-26 11:56:26,079 easyconfig.py:1686 INFO Generating template values...
== 2021-11-26 11:56:26,079 templates.py:189 DEBUG config: zlib EasyConfig @ /opt/ohpc/admin/easybuild/easybuild/easyconfigs/z/zlib/zlib-1.2.11-GCCcore-9.3.0.eb
== 2021-11-26 11:56:26,079 templates.py:216 DEBUG version found in easyconfig is 1.2.11
== 2021-11-26 11:56:26,079 templates.py:318 DEBUG name: bitbucket_account, config: %(namelower)s
== 2021-11-26 11:56:26,079 templates.py:318 DEBUG name: github_account, config: %(namelower)s
== 2021-11-26 11:56:26,079 templates.py:318 DEBUG name: name, config: zlib
== 2021-11-26 11:56:26,079 templates.py:318 DEBUG name: parallel, config: 8
== 2021-11-26 11:56:26,079 templates.py:318 DEBUG name: version, config: 1.2.11
== 2021-11-26 11:56:26,079 templates.py:318 DEBUG name: versionsuffix, config: 
== 2021-11-26 11:56:26,079 templates.py:318 DEBUG name: versionprefix, config: 
== 2021-11-26 11:56:26,080 easyconfig.py:1705 INFO Template values: arch='x86_64', bitbucket_account='zlib', builddir='/opt/ohpc/admin/easybuild/easybuild/build/zlib/1.2.11/GCCcore-9.3.0', github_account='zlib', installdir='/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0', module_name='zlib/1.2.11', name='zlib', nameletter='z', nameletterlower='z', namelower='zlib', parallel='8', toolchain_name='GCCcore', toolchain_version='9.3.0', version='1.2.11', version_major='1', version_major_minor='1.2', version_minor='2', versionprefix='', versionsuffix=''
== 2021-11-26 11:56:26,080 easyblock.py:3606 INFO Running method configure_step part of step configure
== 2021-11-26 11:56:26,080 run.py:214 DEBUG run_cmd: running cmd  ./configure --prefix=/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0  (in /opt/ohpc/admin/easybuild/easybuild/build/zlib/1.2.11/GCCcore-9.3.0/zlib-1.2.11)
== 2021-11-26 11:56:26,080 run.py:233 INFO running cmd:  ./configure --prefix=/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0  
== 2021-11-26 11:56:26,162 build_log.py:169 ERROR EasyBuild crashed with an error (at easybuild/lib/python3.6/site-packages/easybuild/base/exceptions.py:124 in __init__): cmd " ./configure --prefix=/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0 " exited with exit code 1 and output:
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).
** ./configure aborting.
 (at easybuild/lib/python3.6/site-packages/easybuild/tools/run.py:618 in parse_cmd_output)
== 2021-11-26 11:56:26,163 build_log.py:265 INFO ... (took < 1 sec)
== 2021-11-26 11:56:26,163 filetools.py:1971 INFO Removing lock /opt/ohpc/admin/easybuild/easybuild/locks/_opt_ohpc_pub_appseb_znver2_zlib_1.2.11-GCCcore-9.3.0.lock...
== 2021-11-26 11:56:26,164 filetools.py:380 INFO Path /opt/ohpc/admin/easybuild/easybuild/locks/_opt_ohpc_pub_appseb_znver2_zlib_1.2.11-GCCcore-9.3.0.lock successfully removed.
== 2021-11-26 11:56:26,164 filetools.py:1975 INFO Lock removed: /opt/ohpc/admin/easybuild/easybuild/locks/_opt_ohpc_pub_appseb_znver2_zlib_1.2.11-GCCcore-9.3.0.lock
== 2021-11-26 11:56:26,164 easyblock.py:3915 WARNING build failed (first 300 chars): cmd " ./configure --prefix=/opt/ohpc/pub/appseb/znver2/zlib/1.2.11-GCCcore-9.3.0 " exited with exit code 1 and output:
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).
** ./configure aborting.

== 2021-11-26 11:56:26,164 easyblock.py:307 INFO Closing log for application name zlib version 1.2.11
(eb20211109) -bash-4.4$ 

It seems to indicate that zlib finds all the prerequisites... I'll try the generic compilation for comparison next. Thank you again!

ocaisa commented 2 years ago

The newer EB slurm integration is actually pretty trivial (--job-backend=Slurm) since it doesn't require any configuration. Well, that's not entirely true it of course depends on your Slurm setup (and the existence of sensible defaults). You can influence the job submission with relevant Slurm environment variables (see https://slurm.schedmd.com/sbatch.html#SECTION_INPUT-ENVIRONMENT-VARIABLES).

kelbstf commented 2 years ago

That's great to hear, thank you for your encouragement. In this specific deployment so far computes don't have write access to the softwaretree. And EasyBuild is designed in this deploymnt to be used for administratively maintaining a reference grade softwarestack, and therefore lives in a different access controlled path, which is not shared among the computes. That would have to be reworked, which i hoped to do when the 90%-user-requirements are satisfied and user really need EasyBuild ;-) But maybe this has simply to be adjusted now.

kelbstf commented 2 years ago

Hello, i just wanted to feed back that i have rebuilt the deployment concept for easybuild from scratch, allowing now compilations be executed on the compute nodes of the respective architecture, just to make sure. Fundamentally this is now a clean installation of EB 4.5.0, but interestingly enough i still run into the exact same issue with compiling zlib as described above.

easyblock.py:3915 WARNING build failed (first 300 chars): cmd " ·
./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0 " \
exited with exit code 1 and output:\
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).
** ./configure aborting.

I would fall back to compiling zlib generically, but it would feel better to have found the reason... Just in case it might indicate a fundamental issue i can't workaround later on in the same manner...

Does anyone have a recommendation how to obtain deeper debugging information beyond of what "--debug" already provides? Maybe that would get me closer. Or is it possibly better to start the software tree just with installing the singular compiler, and using that for building the toolchain, instead of using the OS level GCC as "bootstrap" compiler?

Any input is highly welcome. Best

kelbstf commented 2 years ago

Hello, i found that it seems to have issues to find a module, which actually is available, and maybe that causes a compiler detection issue:

eb --debug foss-2020a.eb --optarch=haswell --robot --extended-dry-run
== Temporary log file in case of crash /tmp/eb-tw01e941/easybuild-dez2koe9.log

[prepare_step method]
Defining build environment, based on toolchain (options) and specified dependencies...

Loading toolchain module...

module load M4/1.4.18 [SIMULATED]
module load binutils/2.34 [SIMULATED]
module load GCCcore/9.3.0 [SIMULATED]

Loading modules for dependencies...

module load binutils/2.34 [SIMULATED]

Full list of loaded modules:
  1) EasyBuild/20211128-haswell
  2) gnu9/9.3.0

Defining build environment...

[ ... ]

configuring... [DRY RUN]

[configure_step method]
  running command "./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0"
  (in /opt/pm/admin/easybuild/450/easybuild/build/zlib/1.2.11/GCCcore-9.3.0/zlib-1.2.11)

building... [DRY RUN]

[build_step method]
  running command "make  -j 8"
  (in /opt/pm/admin/easybuild/450/easybuild/build/zlib/1.2.11/GCCcore-9.3.0/zlib-1.2.11)

testing... [DRY RUN]

[test_step method]

installing... [DRY RUN]

[stage_install_step method]

[make_installdir method]
directory /opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0 removed

[install_step method]
  running command "make install"
  (in /opt/pm/admin/easybuild/450/easybuild/build/zlib/1.2.11/GCCcore-9.3.0/zlib-1.2.11)

taking care of extensions... [DRY RUN]

[extensions_step method]

restore after iterating... [DRY RUN]

[post_iter_step method]

postprocessing... [DRY RUN]

[post_install_step method]

sanity checking... [DRY RUN]

[sanity_check_step method]
Sanity check paths - file ['files']
  * include/zconf.h
  * include/zlib.h
  * lib/libz.a
  * lib/libz.so
Sanity check paths - (non-empty) directory ['dirs']
  (none)
Sanity check commands
  (none)

cleaning up... [DRY RUN]

[cleanup_step method]
directory /opt/pm/admin/easybuild/450/easybuild/build/zlib/1.2.11/GCCcore-9.3.0 removed

creating module... [DRY RUN]

[make_module_step method]

!!!
!!! WARNING: ignoring error "Can't get value from a non-existing module GCCcore/9.3.0"
!!!

But the module is actually available:

/opt/pm/pub/mod/haswell/all/Core/compiler/GCCcore/9.3.0.lua:
help([==[

Description
===========
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada,
 as well as libraries for these languages (libstdc++, libgcj,...).

More information
================
 - Homepage: https://gcc.gnu.org/
]==])

whatis([==[Description: The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada,
 as well as libraries for these languages (libstdc++, libgcj,...).]==])
whatis([==[Homepage: https://gcc.gnu.org/]==])
whatis([==[URL: https://gcc.gnu.org/]==])

local root = "/opt/pm/pub/app/haswell/GCCcore/9.3.0"

conflict("GCCcore")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/base")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/astro")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/bio")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/cae")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/chem")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/compiler")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/data")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/debugger")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/devel")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/geo")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/ide")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/lang")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/lib")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/math")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/mpi")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/numlib")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/perf")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/quantum")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/phys")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/system")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/toolchain")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/tools")
prepend_path("MODULEPATH", "/opt/pm/pub/mod/haswell/all/Compiler/GCCcore/9.3.0/vis")

prepend_path("CMAKE_LIBRARY_PATH", pathJoin(root, "lib64"))
prepend_path("CMAKE_PREFIX_PATH", root)
prepend_path("LD_LIBRARY_PATH", pathJoin(root, "lib64"))
prepend_path("MANPATH", pathJoin(root, "share/man"))
prepend_path("PATH", pathJoin(root, "bin"))
prepend_path("XDG_DATA_DIRS", pathJoin(root, "share"))
setenv("EBROOTGCCCORE", root)
setenv("EBVERSIONGCCCORE", "9.3.0")
setenv("EBDEVELGCCCORE", pathJoin(root, "easybuild/Core-compiler-GCCcore-9.3.0-easybuild-devel"))

-- Built with EasyBuild version 4.5.0

But the "loaded module" for GCC is still the one belonging to the bootstrap compiler, found as "gnu9/9.3.0" . Is this problem possibly some sort of "handover" issue when compiling a toolchain from scratch, where the bootstrap compiler has a different module name ( "gnu9/9.3.0") then the GCC of the toolchain to build (GCCcore/9.3.0)? I am not familiar with how this alleged "handover" is meant to work behind the scenes... Best

kelbstf commented 2 years ago

Hmm... I tried brute force with commenting the following section from "zlib-1.2.11", in order to silence the error message:

if try $CC -c $CFLAGS $test.c; then
  :
else
  echo "Compiler error reporting is too harsh for $0 (perhaps remove -Werror)." | tee -a configure.log
  leave 1
fi

And it made obvious what this was supposed to check:

== 2021-11-29 19:48:47,806 run.py:214 DEBUG run_cmd: running cmd  ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0  (in /opt/pm/admin/easybuild/eb20211128/easybuild/build/zlib/1.2.11/GCCcore-9.3.0/zlib-1.2.11)
== 2021-11-29 19:48:47,806 run.py:233 INFO running cmd:  ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0  
== 2021-11-29 19:48:47,910 build_log.py:169 ERROR EasyBuild crashed with an error (at easybuild/eb20211128/lib/python3.6/site-packages/easybuild/base/exceptions.py:124 in __init__): cmd " ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0 " exited with exit code 1 and output:
Checking for shared library support...
No shared library support; try without defining CC and CFLAGS
Building static library libz.a version 1.2.11 with gcc.
Checking for size_t... No.
Checking for long long... No.
Failed to find a pointer-size integer type.
** ./configure aborting.

The currently active compiler:

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/ohpc/pub/compiler/gcc/9.3.0/libexec/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --disable-multilib --enable-languages=c,c++,fortran --prefix=/opt/ohpc/pub/compiler/gcc/9.3.0 --disable-static --enable-shared
Thread model: posix
gcc version 9.3.0 (GCC)

Not quite sure yet what to make of it...

kelbstf commented 2 years ago

I just removed all the previous installations, while keeping the sources, and successfully installed just GCC core via "eb --debug GCCcore-9.3.0.eb --optarch=haswell --robot". This also successfully installed "zlib/1.2.11", from the exact same source package. The "bootstrap" compiler used for this was the same as used above. Therefore it looks like the failing compilation of "zlib/1.2.11" only occurs, when installing "eb --debug foss-2020a.eb --optarch=haswell --robot" from scratch. Maybe this rings a bell for the hepcats ;-) Best regards.

ocaisa commented 2 years ago

I don't think this has anything to do with EasyBuild, but was rooted in the (bad) --optarch=x86-rome flag you used, which must have tainted your GCCcore installation somehow

kelbstf commented 2 years ago

@ocaisa: thank you for sticking around! Actually the issue still persists, and the cross compilation situation was the first trait i excluded from my debugging effort.

The situtation as of now is: a) eb --debug GCCcore-9.3.0.eb --optarch=haswell --robot # works, including "zlib 1.2.11" b) eb --debug foss-2020a.eb --optarch=haswell --robot # fails, due to failing "zlib 1.2.11"

It looks like the configuration for the zlib build behaves differently in the contexts of the different packages to install? No idea if that makes sense.... Also zlib seems to trigger such types of errors upon compilation in other situations as well: https://www.bbsmax.com/A/kjdwQOeEJN/. Maybe zlib has some special compiler checks going on?

Best regards

kelbstf commented 2 years ago

Hello, i just tried to install "foss-2020a.eb" after having successfully installed "GCCcore/9.3.0", using "GCCcore/9.3.0". But it seems to not detect the already installed GCCCore, and accordinlgy also not the already installed zlib, and therefore fails in the exact same manner:

eb --debug foss-2020a.eb --robot
== Temporary log file in case of crash /tmp/eb-du2h8ikk/easybuild-zotzyu1e.log
== found valid index for /opt/pm/admin/easybuild/eb20211128/easybuild/easyconfigs, so using it...
== found valid index for /opt/pm/admin/easybuild/eb20211128/easybuild/easyconfigs, so using it...
== resolving dependencies ...
== processing EasyBuild easyconfig /opt/pm/admin/easybuild/eb20211128/easybuild/easyconfigs/z/zlib/zlib-1.2.11-GCCcore-9.3.0.eb
== building and installing Compiler/GCCcore/9.3.0/lib/zlib/1.2.11...
== fetching files...
== ... (took < 1 sec)
== creating build dir, resetting environment...
== ... (took < 1 sec)
== unpacking...
== ... (took < 1 sec)
== patching...
== ... (took < 1 sec)
== preparing...
== ... (took < 1 sec)
== configuring...
== ... (took < 1 sec)
== FAILED: Installation ended unsuccessfully (build directory: /opt/pm/admin/easybuild/eb20211128/easybuild/build/zlib/1.2.11/GCCcore-9.3.0): build failed (first 300 chars): cmd " ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0 " exited with exit code 1 and output:
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).
** ./configure aborting.
 (took 0 secs)
== Results of the build can be found in the log file(s) /tmp/eb-du2h8ikk/easybuild-zlib-1.2.11-20211130.153557.AWtkg.log
ERROR: Build of /opt/pm/admin/easybuild/eb20211128/easybuild/easyconfigs/z/zlib/zlib-1.2.11-GCCcore-9.3.0.eb failed (err: 'build failed (first 300 chars): cmd " ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-9.3.0 " exited with exit code 1 and output:\nCompiler error reporting is too harsh for ./configure (perhaps remove -Werror).\n** ./configure aborting.\n')

But loading the EB-installed GCC and zlib works perfectly fine:

/opt/pm/pub/mod/haswell/all/Core/lib/zlib/1.2.11.lua:

help([==[

Description
===========
zlib is designed to be a free, general-purpose, legally unencumbered -- that
 is, not covered by any patents -- lossless data-compression library for use
 on virtually any computer hardware and operating system.

More information
================
 - Homepage: https://www.zlib.net/
]==])

whatis([==[Description: 
 zlib is designed to be a free, general-purpose, legally unencumbered -- that
 is, not covered by any patents -- lossless data-compression library for use
 on virtually any computer hardware and operating system.
]==])
whatis([==[Homepage: https://www.zlib.net/]==])
whatis([==[URL: https://www.zlib.net/]==])

local root = "/opt/pm/pub/app/haswell/zlib/1.2.11"

conflict("zlib")

prepend_path("CMAKE_PREFIX_PATH", root)
prepend_path("CPATH", pathJoin(root, "include"))
prepend_path("LD_LIBRARY_PATH", pathJoin(root, "lib"))
prepend_path("LIBRARY_PATH", pathJoin(root, "lib"))
prepend_path("MANPATH", pathJoin(root, "share/man"))
prepend_path("PKG_CONFIG_PATH", pathJoin(root, "lib/pkgconfig"))
prepend_path("XDG_DATA_DIRS", pathJoin(root, "share"))
setenv("EBROOTZLIB", root)
setenv("EBVERSIONZLIB", "1.2.11")
setenv("EBDEVELZLIB", pathJoin(root, "easybuild/Core-lib-zlib-1.2.11-easybuild-devel"))

-- Built with EasyBuild version 4.5.0

EB pathes are all perfectly accessible for the EB account, that should not be a reason... I had put my bets on EB detecting the already installed GCC and zlib when installing "foss-2020a.eb", and just adding the remaining package belonging to "foss-2020a.eb".

kelbstf commented 2 years ago

Just realised the following:

eb --debug foss-2021a.eb --optarch=haswell --robot

== processing EasyBuild easyconfig 
/opt/pm/admin/easybuild/eb20211128/easybuild/easyconfigs/z/zlib/zlib-1.2.11.eb
== COMPLETED: Installation ended successfully</style>
(took 2 secs)

[ ... ]

== processing EasyBuild easyconfig 
/opt/pm/admin/easybuild/eb20211128/easybuild/easyconfigs/z/zlib/zlib-1.2.11-GCCcore-10.3.0.eb
== FAILED: Installation ended unsuccessfully 
(build directory: /opt/pm/admin/easybuild/eb20211128/easybuild/build/zlib/1.2.11/GCCcore-10.3.0): 
build failed 

(first 300 chars): cmd " ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-10.3.0 " exited with exit code 1 and output:\
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).
** ./configure aborting.

In more details:

Successful: 
/opt/pm/pub/app/haswell/zlib/1.2.11/easybuild/easybuild-zlib-1.2.11-20211130.172808.log

INFO Template values: 
builddir='/opt/pm/admin/easybuild/eb20211128/easybuild/build/zlib/1.2.11/system-system', \
installdir='/opt/pm/pub/app/haswell/zlib/1.2.11', \
module_name='zlib/1.2.11', name='zlib', nameletter='z', nameletterlower='z', namelower='zlib', parallel='8', \
toolchain_name='system', toolchain_version='system', \
version='1.2.11', version_major='1', version_major_minor='1.2', version_minor='2', versionprefix='', versionsuffix=''

INFO Running method configure_step part of step configure
running cmd  ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11  \
(in /opt/pm/admin/easybuild/eb20211128/easybuild/build/zlib/1.2.11/system-system/zlib-1.2.11)

INFO running cmd:  ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11  

run.py:623 INFO \
cmd " ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11 " exited with exit code 0 and output:

Configured with: ../configure --disable-multilib --enable-languages=c,c++,fortran --prefix=/opt/ohpc/pub/compiler/gcc/9.3.0 --disable-static --enable-shared
gcc version 9.3.0 (GCC) 
Failing: 
/tmp/eb-nj_ja5rh/easybuild-zlib-1.2.11-20211130.182305.wWRsd.log

INFO Template values: 
builddir='/opt/pm/admin/easybuild/eb20211128/easybuild/build/zlib/1.2.11/GCCcore-10.3.0', \
installdir='/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-10.3.0', \
module_name='zlib/1.2.11', name='zlib', nameletter='z', nameletterlower='z', namelower='zlib', parallel='8', \
toolchain_name='GCCcore', toolchain_version='10.3.0', \
version='1.2.11', version_major='1', version_major_minor='1.2', version_minor='2', versionprefix='', versionsuffix=''

INFO Running method configure_step part of step configure
running cmd  ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-10.3.0  \
(in /opt/pm/admin/easybuild/eb20211128/easybuild/build/zlib/1.2.11/GCCcore-10.3.0/zlib-1.2.11)

INFO running cmd:  ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-10.3.0  

build_log.py:169 ERROR \
EasyBuild crashed with an error \
(at easybuild/eb20211128/lib/python3.6/site-packages/easybuild/base/exceptions.py:124 in __init__): \
cmd " ./configure --prefix=/opt/pm/pub/app/haswell/zlib/1.2.11-GCCcore-10.3.0 " exited with exit code 1 and output:
Compiler error reporting is too harsh for ./configure (perhaps remove -Werror).

If i understand this right: the compilation of the same zlib from the same source code

Unfortunately there seems to be no info WHY the configuration fails in the second case... I.o.w.: i have no information which prefix has been used in the configuration step...

akesandgren commented 2 years ago

"eb --debug foss-2021a.eb --optarch=haswell --robot" is wrong, the argument to optarch is a compiler option without the leading "-", i.e. --optarch=march=haswell, or --optarch="GCC:march=haswell", don't remember if the first one works also since you're specifically building GCC/foss

ocaisa commented 2 years ago

If you want to compile everything for GCCcore for haswell use:

--optarch="GCCcore:march=haswell -mtune=haswell"

In general just remember that these options are passed through to the underlying compiler, so that compiler needs to be able to understand them. If you look in the config.log files inside the build directories of your failing builds you will probably see that GCC is barfing because it doesn't understand the flag you are giving it (this doesn't affect the system compiler since these flags are not used for the system compiler). The setting I give above is specifically for GCCcore, if you want it for everything you would remove the GCCcore: specification.

ocaisa commented 2 years ago

Full documentation of this flag is at https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html#specifying-target-architecture-specific-optimization-flags-to-use-via-optarch-flags

kelbstf commented 2 years ago

Thank you very much to all of you, that was sooo helpful!

All issues vanished using the correct syntax you pointed out.

I (re)read the official documentation on this topic that you referenced, and now i also understand the specific syntax of "--optarch". The gist as i understand it is:

  1. the flags to use for indicating the architecture are the ones of the target compiler, as easybuild passes those strings through to the compiler
  2. the recommended approach to optimisation is to leave it to easybuild, and therefore executing the compilation on the target microarchitecture. "--optarch" actually should preferrably be used to disable easybuild's automatic optimisation for obtaining generic builts.

"2." is now implemented over here for executing EasyBuild installations via SLURM on the target compute nodes.

Best regards

kelbstf commented 2 years ago

Many thanks again! Issue is solved from my perspective.