Closed climbfuji closed 1 year ago
Turning off extdata2g didn't help, but turning off pflogger on top of it solved the problem for now. I am going to check next if turning off extdata2g fixes the ufs-weather-model problem on Hera or not.
@climbfuji This "stack full" error is the error we get when we try to build MAPL with Intel Fortran higher than 2021.6. Which is the reason we've stuck to that version.
We finally got access to a new system at NCCS that can finally have Intel 2021.10 installed (new enough OS) and our hope is that with help of @tclune we can figure this out "soon". Hopefully perhaps next week, though the new system is still in shakedown, so it's up-and-down as testing goes on.
@climbfuji This "stack full" error is the error we get when we try to build MAPL with Intel Fortran higher than 2021.6. Which is the reason we've stuck to that version.
We finally got access to a new system at NCCS that can finally have Intel 2021.10 installed (new enough OS) and our hope is that with help of @tclune we can figure this out "soon". Hopefully perhaps next week, though the new system is still in shakedown, so it's up-and-down as testing goes on.
Did you say you/someone from your team have access to Orion? Then you also have access to Hercules which has 2021.9.0.
The good thing is that we can simply turn off pflogger and extdata2g and the error goes away ...
My suggestion for mapl (or really any package) would be to not turn on variants by default if there are known issues, or if there are at least encode it in the package. That is, either add conflict statements for variants pflogger and extdata2g with intel@2021.7:
and/or (I would prefer and) turn these options off by default.
My suggestion for mapl (or really any package) would be to not turn on variants by default...
I agree in principle. The problem is that some of these variants are default behavior for GEOS and the CMake switch was then added after the fact to support external users. (Mostly NOAA.) The alternative is to put a "negative" into the name of the switch, but that is usually more confusing.
And there were no "known" issues, as we do not yet have real access to 2021.7 in our development system.
But happy to add additional checks to simplify things for external users. (@mathomp4 - let me know if you see how to proceed.)
How about we leave the defaults as-is and add those conflict statements for mapl versions up to 2.40.3?
How about we leave the defaults as-is and add those conflict statements for mapl versions up to 2.40.3?
We can do that. I suppose we should go full bore, then. Intel should conflict 2021.7 and higher (if that's how spack does it? Intel Fortran 2021.6 is oneAPI 2022.1 I think).
We should also start locking GNU. We technically only ever use GCC 12 in recent days. I suspect GCC 11.3 or higher might work. I have no confidence in any GCC 10 or 9.
NOTE: My current belief is that this is technically not a MAPL issue. I think it's probably in pflogger or some GFE library and MAPL is just exposing the issue.
That's a dilemma!
We can't do a blanket conflict for intel 2021.7+ because that means we can't install the full spack-stack on Hercules, Derecho etc. Likewise, for gnu we must be able to install for versions 9 to 12, because some systems don't have anything newer (hera: 9.2.0). We've used those extensively with the UFS and MAPL worked fine (but note that gcc@13 is a big NO for spack-stack).
I am suggesting to simply add this:
> git diff
diff --git a/var/spack/repos/builtin/packages/mapl/package.py b/var/spack/repos/builtin/packages/mapl/package.py
index 54cef1e40e..5894daef46 100644
--- a/var/spack/repos/builtin/packages/mapl/package.py
+++ b/var/spack/repos/builtin/packages/mapl/package.py
@@ -175,6 +175,10 @@ class Mapl(CMakePackage):
values=("Debug", "Release", "Aggressive"),
)
+ # https://github.com/JCSDA/spack-stack/issues/769
+ conflicts("+pflogger", when="@:2.40.3 %intel@2021.7:")
+ conflicts("+extdata2g", when="@:2.40.3 %intel@2021.7:")
+
depends_on("cmake@3.17:", type="build")
depends_on("mpi")
depends_on("hdf5")
Describe the bug Building mapl@2.35.2 on MSU Hercules with
intel@2021.9.0
as in the release/1.5.0 branches of spack-stack gives the following error:It built successfully a few weeks ago (2023/08/14), these are the differences in build options:
pflogger
,fargparse
andextdata2g
were off a few weeks ago, now they are on. I will start debugging by turning offextdata2g
.To Reproduce Check out release/1.5.0 branches and try to build unified-dev environment with Intel.
Expected behavior It builds!
System: MSU Hercules with Intel
Additional context On Hera, The ufs-weather-model testing crashes with mysterious segmentation faults / double free corruption at the very and of the model run, when cleaning up after itself. The
extdata2g
library shows up in the stacktrace, therefore I'll start with this one.On Narwhal, it seems like the
fargparse
is required to fix other build errors, so let's try to keep that one on. Or was that thef2py
problem?