JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
27 stars 44 forks source link

Build error: mapl@2.35.2 on MSU Hercules with Intel #769

Closed climbfuji closed 1 year ago

climbfuji commented 1 year ago

Describe the bug Building mapl@2.35.2 on MSU Hercules with intel@2021.9.0 as in the release/1.5.0 branches of spack-stack gives the following error:

     1682    [ 78%] Building Fortran object generic/CMakeFiles/MAPL.generic.dir/BaseComponent.F90.o
     1683    cd /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/generic && /apps/spack-man
             aged/oneapi-2023.1.0/intel-oneapi-mpi-2021.9.0-a66eaipzsnyrdgaqzxmqmqz64qzvhkse/mpi/2021.9.0/bin/mpiifort -DBUILD_WITH_PFLOGGER -DFORTRAN_COMPILER_SUPPORTS_ASSUMED_TYPE -DFORTRAN_COMPILER_SUPPO
             RTS_FINDLOC -DH5_HAVE_PARALLEL -DHAS_NETCDF3 -DHAS_NETCDF4 -DHAVE_SHMEM -DNETCDF_NEED_NF_MPIIO -DsysLinux -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stag
             e/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/generic -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4c
             teyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/generic -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtu
             zqpxadz7kv/spack-build-o4cteyz/include/MAPL.generic -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxad
             z7kv/spack-src/include -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/oomph -I/work/
             noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/oomph -I/work/noaa/epic/role-epic/s
             pack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/include/MAPL.oomph -I/work/noaa/epic/role-epic/spack-sta
             ck/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/base -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc
             1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/base -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/sp
             ack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/include/MAPL.base -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage
             -mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/shared -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscn
             tmwtuzqpxadz7kv/spack-build-o4cteyz/shared -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spac
             k-build-o4cteyz/include/MAPL.shared -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/s
             hared/Constants -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/shared/Cons
             tants -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/include/MAPL.constant
             s -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/profiler -I/work/noaa/epic/role-epi
             c/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/profiler -I/work/noaa/epic/role-epic/spack-stack/herc
             ules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/include/MAPL.profiler -I/work/noaa/epic/role-epic/spack-stack/hercules/
             spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/pfio -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/buil
             d_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/pfio -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-ma
             pl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-build-o4cteyz/include/MAPL.pfio -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2
             -o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/MAPL_cfio -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpx
             adz7kv/spack-build-o4cteyz/MAPL_cfio_r4 -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-b
             uild-o4cteyz/include/MAPL_cfio_r4 -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/gftl-shared-1.5.0-ngj4xq3/GFTL_SHARED-1.5/includ
             e/v1 -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/gftl-1.9.0-h7o4vvb/GFTL-1.9/include/v1 -I/work/noaa/epic/role-epic/spack-stac
             k/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/pflogger-1.10.0-ub7ojou/PFLOGGER-1.10/include -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/env
             s/unified-env/install/intel/2021.9.0/yafyaml-1.1.0-myqgmae/YAFYAML-1.1/include -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/gft
             l-1.9.0-h7o4vvb/GFTL-1.9/include/v2 -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/gftl-shared-1.5.0-ngj4xq3/GFTL_SHARED-1.5/incl
             ude/v2 -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/netcdf-fortran-4.6.0-5dj7e4c/include -I/work/noaa/epic/role-epic/spack-stac
             k/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/netcdf-c-4.9.2-blbiwxx/include -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/envs/unified-env/i
             nstall/intel/2021.9.0/esmf-8.4.2-ehkeofb/include -I/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/envs/unified-env/install/intel/2021.9.0/parallelio-2.5.10-5m25ri6/include
              -O3 -g -march=core-avx2 -fma -qopt-report0 -ftz -align all -fno-alias -align array32byte -traceback -assume realloc_lhs -fpe3 -fp-model consistent -assume noold_maxminloc -align dcommons -modu
             le ../include/MAPL.generic -fPIC -qopenmp -c /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spac
             k-src/generic/BaseComponent.F90 -o CMakeFiles/MAPL.generic.dir/BaseComponent.F90.o
     1684    ifort: command line warning #10121: overriding '-march=icelake-client' with '-march=core-avx2'
     1685    ifort: command line warning #10121: overriding '-march=icelake-client' with '-march=core-avx2'
     1686    ifort: command line warning #10121: overriding '-march=icelake-client' with '-march=core-avx2'
  >> 1687    /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/generic/AbstractFrameworkComponent.F90(
             27): error #6618: The INTERFACE/CONTAINS stack is full. This means that modules are being 'used' in a circular way.
     1688          end function i_AddChildComponent
     1689    ^
     1690    compilation aborted for /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/generic/Abstrac
             tFrameworkComponent.F90 (code 1)
  >> 1691    make[2]: *** [generic/CMakeFiles/MAPL.generic.dir/build.make:104: generic/CMakeFiles/MAPL.generic.dir/AbstractFrameworkComponent.F90.o] Error 1
     1692    make[2]: *** Waiting for unfinished jobs....
  >> 1693    /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/generic/BaseComponent.F90(17): error #6
             618: The INTERFACE/CONTAINS stack is full. This means that modules are being 'used' in a circular way.
     1694    contains
     1695    ^
     1696    compilation aborted for /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.0-rc1/cache/build_stage/spack-stage-mapl-2.35.2-o4cteyzqg3kcldscntmwtuzqpxadz7kv/spack-src/generic/BaseCom
             ponent.F90 (code 1)
  >> 1697    make[2]: *** [generic/CMakeFiles/MAPL.generic.dir/build.make:169: generic/CMakeFiles/MAPL.generic.dir/BaseComponent.F90.o] Error 1

It built successfully a few weeks ago (2023/08/14), these are the differences in build options:

 # WORKED
'-DBUILD_WITH_FLAP:BOOL=OFF' '-DBUILD_WITH_PFLOGGER:BOOL=OFF' '-DBUILD_WITH_FARGPARSE:BOOL=OFF' '-DBUILD_SHARED_MAPL:BOOL=OFF' '-DUSE_EXTDATA2G:BOOL=OFF' '-DUSE_F2PY:BOOL=OFF'

# FAILED
'-DBUILD_WITH_FLAP:BOOL=OFF' '-DBUILD_WITH_PFLOGGER:BOOL=ON' '-DBUILD_WITH_FARGPARSE:BOOL=ON' '-DBUILD_SHARED_MAPL:BOOL=OFF' '-DUSE_EXTDATA2G:BOOL=ON' '-DUSE_F2PY:BOOL=OFF'

pflogger, fargparse and extdata2g were off a few weeks ago, now they are on. I will start debugging by turning off extdata2g.

To Reproduce Check out release/1.5.0 branches and try to build unified-dev environment with Intel.

Expected behavior It builds!

System: MSU Hercules with Intel

Additional context On Hera, The ufs-weather-model testing crashes with mysterious segmentation faults / double free corruption at the very and of the model run, when cleaning up after itself. The extdata2g library shows up in the stacktrace, therefore I'll start with this one.

On Narwhal, it seems like the fargparse is required to fix other build errors, so let's try to keep that one on. Or was that the f2py problem?

climbfuji commented 1 year ago

Turning off extdata2g didn't help, but turning off pflogger on top of it solved the problem for now. I am going to check next if turning off extdata2g fixes the ufs-weather-model problem on Hera or not.

mathomp4 commented 1 year ago

@climbfuji This "stack full" error is the error we get when we try to build MAPL with Intel Fortran higher than 2021.6. Which is the reason we've stuck to that version.

We finally got access to a new system at NCCS that can finally have Intel 2021.10 installed (new enough OS) and our hope is that with help of @tclune we can figure this out "soon". Hopefully perhaps next week, though the new system is still in shakedown, so it's up-and-down as testing goes on.

climbfuji commented 1 year ago

@climbfuji This "stack full" error is the error we get when we try to build MAPL with Intel Fortran higher than 2021.6. Which is the reason we've stuck to that version.

We finally got access to a new system at NCCS that can finally have Intel 2021.10 installed (new enough OS) and our hope is that with help of @tclune we can figure this out "soon". Hopefully perhaps next week, though the new system is still in shakedown, so it's up-and-down as testing goes on.

Did you say you/someone from your team have access to Orion? Then you also have access to Hercules which has 2021.9.0.

climbfuji commented 1 year ago

The good thing is that we can simply turn off pflogger and extdata2g and the error goes away ...

climbfuji commented 1 year ago

My suggestion for mapl (or really any package) would be to not turn on variants by default if there are known issues, or if there are at least encode it in the package. That is, either add conflict statements for variants pflogger and extdata2g with intel@2021.7: and/or (I would prefer and) turn these options off by default.

tclune commented 1 year ago

My suggestion for mapl (or really any package) would be to not turn on variants by default...

I agree in principle. The problem is that some of these variants are default behavior for GEOS and the CMake switch was then added after the fact to support external users. (Mostly NOAA.) The alternative is to put a "negative" into the name of the switch, but that is usually more confusing.

And there were no "known" issues, as we do not yet have real access to 2021.7 in our development system.

But happy to add additional checks to simplify things for external users. (@mathomp4 - let me know if you see how to proceed.)

climbfuji commented 1 year ago

How about we leave the defaults as-is and add those conflict statements for mapl versions up to 2.40.3?

mathomp4 commented 1 year ago

How about we leave the defaults as-is and add those conflict statements for mapl versions up to 2.40.3?

We can do that. I suppose we should go full bore, then. Intel should conflict 2021.7 and higher (if that's how spack does it? Intel Fortran 2021.6 is oneAPI 2022.1 I think).

We should also start locking GNU. We technically only ever use GCC 12 in recent days. I suspect GCC 11.3 or higher might work. I have no confidence in any GCC 10 or 9.

NOTE: My current belief is that this is technically not a MAPL issue. I think it's probably in pflogger or some GFE library and MAPL is just exposing the issue.

climbfuji commented 1 year ago

That's a dilemma!

We can't do a blanket conflict for intel 2021.7+ because that means we can't install the full spack-stack on Hercules, Derecho etc. Likewise, for gnu we must be able to install for versions 9 to 12, because some systems don't have anything newer (hera: 9.2.0). We've used those extensively with the UFS and MAPL worked fine (but note that gcc@13 is a big NO for spack-stack).

I am suggesting to simply add this:

> git diff
diff --git a/var/spack/repos/builtin/packages/mapl/package.py b/var/spack/repos/builtin/packages/mapl/package.py
index 54cef1e40e..5894daef46 100644
--- a/var/spack/repos/builtin/packages/mapl/package.py
+++ b/var/spack/repos/builtin/packages/mapl/package.py
@@ -175,6 +175,10 @@ class Mapl(CMakePackage):
         values=("Debug", "Release", "Aggressive"),
     )

+    # https://github.com/JCSDA/spack-stack/issues/769
+    conflicts("+pflogger", when="@:2.40.3 %intel@2021.7:")
+    conflicts("+extdata2g", when="@:2.40.3 %intel@2021.7:")
+
     depends_on("cmake@3.17:", type="build")
     depends_on("mpi")
     depends_on("hdf5")
climbfuji commented 1 year ago

Closed via https://github.com/JCSDA/spack-stack/pull/770