easybuilders / easybuild-easyconfigs

A collection of easyconfig files that describe which software to build using which build options with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
380 stars 703 forks source link

Boost-1.66.0-foss-2018a.eb (and all other Boost EB's ) get Segfault during compilation #6581

Open nortex opened 6 years ago

nortex commented 6 years ago

Hi all,

Every Boost version and toolchain that i try to compile using Easybuild, during the "build" stage i get broken pipe message that terminate my session. Running with --debug give the next error:

== 2018-07-16 11:55:47,225 run.py:559 INFO parse_log_for_error msg: Command used:  ./bjam  --prefix=/data/sources/build/Boost/1.66.0/foss-2018a/obj cxxflags='-O2 -ftree-vectorize -march=native -fno-math-errno -fPIC' linkflags='-L/data/apps/Easybuild/apps/GCCcore/6.4.0/lib
64 -L/data/apps/Easybuild/apps/GCCcore/6.4.0/lib -L/data/apps/Easybuild/apps/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib -L/data/apps/Easybuild/apps/ScaLAPACK/2.0.2-gompi-2018a-OpenBLAS-0.2.20/lib -L/data/apps/Easybuild/apps/FFTW/3.3.7-gompi-2018a/lib -L/data/apps/Easybuild/apps/b
zip2/1.0.6-foss-2018a/lib -L/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/lib' -sBZIP2_INCLUDE=/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/include -sBZIP2_LIBPATH=/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/lib -sZLIB_INCLUDE=/data/apps/Easybuild/apps/zlib
/1.2.11-GCCcore-6.4.0/include -sZLIB_LIBPATH=/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/lib --user-config=user-config.jam --with-mpi install -j 20
== 2018-07-16 11:55:47,225 run.py:561 INFO parse_log_for_error (some may be harmless) regExp (?<![(,-]|\w)(?:error|segmentation fault|failed)(?![(,-]|\.?\w) found:
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/whitespace_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/unpaired.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/unexpected_end_of_input.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/unexpected_character.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/none_of_the_expected_cases_found.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/literal_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/letter_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/index_out_of_range.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/end_of_input_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/digit_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/whitespace_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/unpaired.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/unexpected_end_of_input.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/unexpected_character.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/none_of_the_expected_cases_found.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/literal_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/letter_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/index_out_of_range.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/expected_to_fail.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/end_of_input_expected.hpp
common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/digit_expected.hpp
== 2018-07-16 11:55:47,225 run.py:518 WARNING Found 21 errors in command output (output: common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/whitespace_expected.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/
boost/metaparse/error/unpaired.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/unexpected_end_of_input.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/unexpected_character.hpp, c
ommon.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/none_of_the_expected_cases_found.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/literal_expected.hpp, common.copy /data/sources/build/B
oost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/letter_expected.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/index_out_of_range.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metapa
rse/error/end_of_input_expected.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/error/digit_expected.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/whitespace_expected.hpp, common.
copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/unpaired.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/unexpected_end_of_input.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-
2018a/obj/include/boost/metaparse/v1/error/unexpected_character.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/none_of_the_expected_cases_found.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boos
t/metaparse/v1/error/literal_expected.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/letter_expected.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/index_out_of_range.hpp
, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/expected_to_fail.hpp, common.copy /data/sources/build/Boost/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/end_of_input_expected.hpp, common.copy /data/sources/build/Boo
st/1.66.0/foss-2018a/obj/include/boost/metaparse/v1/error/digit_expected.hpp)
== 2018-07-16 11:55:47,226 run.py:192 INFO running cmd: ./bjam --clean-all output:
> Performing configuration checks
> 
>     - 32-bit                   : no  (cached)
>     - 64-bit                   : yes (cached)
>     - arm                      : no  (cached)
>     - mips1                    : no  (cached)
>     - power                    : no  (cached)
>     - sparc                    : no  (cached)
>     - x86                      : yes (cached)
> 
> Building the Boost C++ Libraries.
> 
> 
>     - symlinks supported       : yes
>     - C++11 mutex              : yes
>     - lockfree boost::atomic_flag : yes
>     - Boost.Config Feature Check: cxx11_auto_declarations : yes
>     - Boost.Config Feature Check: cxx11_constexpr : yes
>     - Boost.Config Feature Check: cxx11_defaulted_functions : yes
>     - Boost.Config Feature Check: cxx11_final : yes
>     - Boost.Config Feature Check: cxx11_hdr_mutex : yes
>     - Boost.Config Feature Check: cxx11_hdr_regex : yes
>     - Boost.Config Feature Check: cxx11_hdr_tuple : yes
>     - Boost.Config Feature Check: cxx11_lambdas : yes
>     - Boost.Config Feature Check: cxx11_noexcept : yes
>     - Boost.Config Feature Check: cxx11_nullptr : yes
>     - Boost.Config Feature Check: cxx11_rvalue_references : yes
>     - Boost.Config Feature Check: cxx11_template_aliases : yes
>     - Boost.Config Feature Check: cxx11_thread_local : yes
>     - Boost.Config Feature Check: cxx11_variadic_templates : yes
>     - has_icu builds           : no
> warning: Graph library does not contain MPI-based parallel components.
> note: to enable them, add "using mpi ;" to your user-config.jam
>     - zlib                     : yes
>     - bzip2                    : yes
>     - lzma                     : no
>     - iconv (libc)             : yes
>     - icu                      : no
>     - icu (lib64)              : no
>     - native-atomic-int32-supported : yes
>     - native-syslog-supported  : yes
>     - pthread-supports-robust-mutexes : yes
>     - compiler-supports-visibility : yes
>     - compiler-supports-ssse3  : yes
>     - compiler-supports-avx2   : yes
>     - gcc visibility           : yes
>     - long double support      : yes
> warning: skipping optional Message Passing Interface (MPI) library.
> note: to enable MPI support, add "using mpi ;" to user-config.jam.
> note: to suppress this message, pass "--without-mpi" to bjam.
> note: otherwise, you can safely ignore this message.
>     - libbacktrace builds      : no
>     - addr2line builds         : yes
>     - WinDbg builds            : no
>     - WinDbgCached builds      : no
>     - zlib                     : yes
>     - bzip2                    : yes
>     - lzma                     : no
> ...found 1 target...
> ...updating 1 target...
> common.Clean clean-all
> ...updated 1 target...
> 
> == 2018-07-12 10:37:11,265 run.py:542 DEBUG Using default regular expression: (?<![(,-]|\w)(?:error|segmentation fault|failed)(?![(,-]|\.?\w)
> == 2018-07-12 10:37:11,265 boost.py:210 INFO Installing boost libraries
> == 2018-07-12 10:37:11,266 run.py:173 DEBUG run_cmd: running cmd  ./bjam  --prefix=/data/sources/build/Boost/1.66.0/foss-2018a/obj cxxflags='-O2 -ftree-vectorize -march=x86-64 -mtune=generic -fno-math-errno -fPIC' linkflags='-L/data/apps/Easybuild/apps/GCCcore/6.4.0/lib64 -L/data/apps/Easybuild/apps/GCCcore/6.4.0/lib -L/data/apps/Easybuild/apps/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib -L/data/apps/Easybuild/apps/ScaLAPACK/2.0.2-gompi-2018a-OpenBLAS-0.2.20/lib -L/data/apps/Easybuild/apps/FFTW/3.3.7-gompi-2018a/lib -L/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/lib -L/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/lib' -sBZIP2_INCLUDE=/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/include -sBZIP2_LIBPATH=/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/lib -sZLIB_INCLUDE=/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/include -sZLIB_LIBPATH=/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/lib install -j 20  (in /data/sources/build/Boost/1.66.0/foss-2018a/boost_1_66_0)
> == 2018-07-12 10:37:11,266 run.py:192 INFO running cmd:  ./bjam  --prefix=/data/sources/build/Boost/1.66.0/foss-2018a/obj cxxflags='-O2 -ftree-vectorize -march=x86-64 -mtune=generic -fno-math-errno -fPIC' linkflags='-L/data/apps/Easybuild/apps/GCCcore/6.4.0/lib64 -L/data/apps/Easybuild/apps/GCCcore/6.4.0/lib -L/data/apps/Easybuild/apps/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib -L/data/apps/Easybuild/apps/ScaLAPACK/2.0.2-gompi-2018a-OpenBLAS-0.2.20/lib -L/data/apps/Easybuild/apps/FFTW/3.3.7-gompi-2018a/lib -L/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/lib -L/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/lib' -sBZIP2_INCLUDE=/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/include -sBZIP2_LIBPATH=/data/apps/Easybuild/apps/bzip2/1.0.6-foss-2018a/lib -sZLIB_INCLUDE=/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/include -sZLIB_LIBPATH=/data/apps/Easybuild/apps/zlib/1.2.11-GCCcore-6.4.0/lib install -j 20  
> 

Any ideas what causing the segfault?

Thanks.

zao commented 6 years ago

None of the output cited here seems to report any problems, is there no information in the .log? The false positives of /error/ paths are to be expected.

This recipe builds fine on my cluster, have you ensured that there's enough disk space and memory to build it this wide, and what kind of OS/hardware are you on?

nortex commented 6 years ago

@zao My os is: CentOS 6.7, CPUinfo:

processor       : 19
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
stepping        : 4
microcode       : 1064
cpu MHz         : 2199.988
cache size      : 25600 KB
physical id     : 1
siblings        : 10
core id         : 12
cpu cores       : 10
apicid          : 56
initial apicid  : 56
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips        : 4399.36
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual

I tried to use it with different EB versions (3.3 and 3.6.1 , 3.6.2) with different --optarch flags but still the same...

zao commented 6 years ago

I set up a CentOS 6.7 VM, painstakingly found a Python 2.7 good enough to install EasyBuild, and the recipe builds fine. Very odd. Can you somehow upload the full log somewhere, and maybe look in dmesg or /var/log/messages for what process might've terminated surprisingly?

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
stepping        : 10
cpu MHz         : 3696.000
cache size      : 12288 KB
physical id     : 0
siblings        : 6
core id         : 5
cpu cores       : 6
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good xtopology nonstop_tsc unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase avx2 invpcid rdseed
bogomips        : 7392.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
nortex commented 6 years ago

I didn't find anything in /var/log/messages, here is something from dmesg:

CPU0: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz stepping 02
Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Broken BIOS detected, complain to your hardware vendor.
[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
Intel PMU driver.
... version:                3
... bit width:              48
... generic registers:      8
... value mask:             0000ffffffffffff
... max period:             0000ffffffffffff
... fixed-purpose events:   3
... event mask:             00000007000000ff
NMI watchdog enabled, takes one hw-pmu counter.
Booting Node   0, Processors  #1
WARNING: polling idle and HT enabled, performance may degrade.
 #2

Attched full log with debug with the crash. If you have any ideas why this could happen please inform me, as it really strange.

easybuild-ftf9jc.log

zao commented 6 years ago

Seems like I don't know how to operate a CentOS machine, mine claims to be 6.10 after I got Python installed. 🙄

In any way, at the point where your log is cut off, it invokes a long build of the rest of the libraries.

The only things that come to mind to try right now would be to try to build a Boost outside of EasyBuild to see if the b2 terminates or crashes in some way, or to see if tuning down the parallelism of the build might affect things.

I assume that disk space is plenty in the build dir and temp dirs? Boost can be quite hungry.

boegel commented 5 years ago

@nortex Did you ever figure out anything more about this?