E3SM-Project / spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
https://spack.io
Other
1 stars 2 forks source link

Segfault in Spack build of trilinos-for-albany with intel #8

Open xylar opened 1 year ago

xylar commented 1 year ago

This is not a new issue but one I want to revisit. When I try to build trilinos-for-albany on Chrysalis with intel and OpenMPI, I see:

A long error message with a segfault ``` cd /lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/belos/epetra/src && /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/cmake-3.19.1-yisciec/bin/cmake -E cmake_link_script CMakeFiles/belosepetra.dir/link.txt --verbose=1 /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/openmpi-4.1.3-pin4k7o/bin/mpic++ -fPIC -O2 -g -DNDEBUG -shared -Wl,-soname,libbelosepetra.so.13 -o libbelosepetra.so.13.5 CMakeFiles/belosepetra.dir/BelosEpetraAdapter.cpp.o CMakeFiles/belosepetra.dir/BelosEpetraOperator.cpp.o CMakeFiles/belosepetra.dir/BelosEpetraUtils.cpp.o CMakeFiles/belosepetra.dir/Belos_Details_Epetra_registerLinearSolverFactory.cpp.o CMakeFiles/belosepetra.dir/Belos_Details_Epetra_registerSolverFactory.cpp.o -Wl,-rpath,/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/belos/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/xpetra/sup:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/xpetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/adapters/epetraext/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/epetraext/src:/gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/hdf5-1.10.7-eewgp6v/lib:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/adapters/tpetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/ext:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/inout:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/core/compat:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/tpetra/tsqr/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/adapters/epetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/thyra/core/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/rtop/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/aztecoo/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/triutils/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/epetra/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/numerics/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/remainder/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/kokkoscomm/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/comm/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/kokkoscompat/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/parameterlist/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/parser/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/teuchos/core/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos-kernels/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos/algorithms/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos/containers/src:/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/kokkos/core/src:/lcrc/soft/climate/compass/chrysalis/spack/spack_for_mache_1.8.0/opt/spack/linux-rhel8-zen2/intel-20.0.4/metis-5.1.0-fvpnjgznlef67rs2jblxnjoxjaue2iyj/lib: ../../src/libbelos.so.13.5 ../../../xpetra/sup/libxpetra-sup.so.13.5 ../../../xpetra/src/libxpetra.so.13.5 ../../../thyra/adapters/epetraext/src/libthyraepetraext.so.13.5 ../../../epetraext/src/libepetraext.so.13.5 /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/hdf5-1.10.7-eewgp6v/lib/libhdf5.so /usr/lib64/libz.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/hdf5-1.10.7-eewgp6v/lib/libhdf5_hl.so ../../../thyra/adapters/tpetra/src/libthyratpetra.so.13.5 ../../../tpetra/core/ext/libtpetraext.so.13.5 ../../../tpetra/core/inout/libtpetrainout.so.13.5 ../../../tpetra/core/src/libtpetra.so.13.5 ../../../tpetra/core/compat/libtpetraclassic.so.13.5 ../../../tpetra/tsqr/src/libkokkostsqr.so.13.5 ../../../thyra/adapters/epetra/src/libthyraepetra.so.13.5 ../../../thyra/core/src/libthyracore.so.13.5 ../../../rtop/src/librtop.so.13.5 ../../../aztecoo/src/libaztecoo.so.13.5 ../../../triutils/src/libtriutils.so.13.5 ../../../epetra/src/libepetra.so.13.5 ../../../teuchos/numerics/src/libteuchosnumerics.so.13.5 ../../../teuchos/remainder/src/libteuchosremainder.so.13.5 ../../../teuchos/kokkoscomm/src/libteuchoskokkoscomm.so.13.5 ../../../teuchos/comm/src/libteuchoscomm.so.13.5 ../../../teuchos/kokkoscompat/src/libteuchoskokkoscompat.so.13.5 ../../../teuchos/parameterlist/src/libteuchosparameterlist.so.13.5 ../../../teuchos/parser/src/libteuchosparser.so.13.5 ../../../teuchos/core/src/libteuchoscore.so.13.5 ../../../kokkos-kernels/src/libkokkoskernels.so.13.5 /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_intel_lp64.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_sequential.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_core.so /lib64/libpthread.so /lib64/libm.so /lib64/libdl.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_intel_lp64.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_sequential.so /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mkl-2020.4.304-g2qaxzf/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64/libmkl_core.so /lib64/libpthread.so /lib64/libm.so /lib64/libdl.so ../../../kokkos/algorithms/src/libkokkosalgorithms.so.13.5 ../../../kokkos/containers/src/libkokkoscontainers.so.13.5 ../../../kokkos/core/src/libkokkoscore.so.13.5 /usr/lib64/libdl.so /lcrc/soft/climate/compass/chrysalis/spack/spack_for_mache_1.8.0/opt/spack/linux-rhel8-zen2/intel-20.0.4/metis-5.1.0-fvpnjgznlef67rs2jblxnjoxjaue2iyj/lib/libmetis.so ": internal error: ** The compiler has encountered an unexpected problem. ** Segmentation violation signal raised. ** Access violation or stack overflow. Please contact Intel Support for assistance. icpc: error #10105: /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/intel-20.0.4-kodw73g/compilers_and_libraries_2020.4.304/linux/bin/intel64/mcpcom: core dumped icpc: warning #10102: unknown signal(1415383120) icpc: error #10106: Fatal error in /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/intel-20.0.4-kodw73g/compilers_and_libraries_2020.4.304/linux/bin/intel64/mcpcom, terminated by unknown icpc: error #10014: problem during multi-file optimization compilation (code 1) make[2]: *** [packages/stk/stk_unit_test_utils/stk_unit_test_utils/stk_mesh_fixtures/CMakeFiles/stk_mesh_fixtures.dir/build.make:460: packages/stk/stk_unit_test_utils/stk_unit_test_utils/stk_mesh_fixtures/libstk_mesh_fixtures.so.13.5] Error 1 make[2]: Leaving directory '/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi' make[1]: *** [CMakeFiles/Makefile2:14783: packages/stk/stk_unit_test_utils/stk_unit_test_utils/stk_mesh_fixtures/CMakeFiles/stk_mesh_fixtures.dir/all] Error 2 cd /lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi/packages/belos/epetra/src && /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/gcc-9.3.0/cmake-3.19.1-yisciec/bin/cmake -E cmake_symlink_library libbelosepetra.so.13.5 libbelosepetra.so.13 libbelosepetra.so make[2]: Leaving directory '/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi' [ 80%] Built target belosepetra make[1]: Leaving directory '/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-trilinos-for-albany-develop-we2yuzibwzgdeomqm5qiqy7ndkisjkpj/spack-build-we2yuzi' make: *** [Makefile:174: all] Error 2 ```

I believe this is the same issue I have seen previously with Intel on all machines I've tried.

It would be really great to get this resolved, since Intel and OpenMPI are the production compilers on Chrysalis for E3SM.

xylar commented 1 year ago

@ikalash, @mperego, @bartgol, @jewatkins

I'm just reviving this issue in the hope that we can resolve it in the not-too-distant future. It's not super urgent, just bothersome.

xylar commented 1 year ago

I see the same issue on Chrysalis when I use Intel and Intel-MPI (instead of OpenMPI).

Here is a gist with a shell script and yaml file that can be used to reproduce the issue: https://gist.github.com/xylar/4fa8c2a7a54335068bedbb38a64d584c

bartgol commented 1 year ago

Ugh, ICEs are annoying, and often require seemingly ineffective changes (like split a cpp file in two, or remove a const), with lots of bisections iterations to identify the problematic code snippet.

That said, from the error msg, it appears it's trying to build the stk_unit_test_utils package, which I think we can do away with. Can you try to set Trilinos_ENABLE_STKUnit_test_utils:BOOL=OFF? This is always off in my trilinos builds, along with a few other stk sub-packages:

  -D Trilinos_ENABLE_STK:BOOL=ON                                  \
  -D Trilinos_ENABLE_STKDoc_tests:BOOL=OFF                        \
  -D Trilinos_ENABLE_STKIO:BOOL=ON                                \
  -D Trilinos_ENABLE_STKMesh:BOOL=ON                              \
  -D Trilinos_ENABLE_STKSearch:BOOK=ON                            \
  -D Trilinos_ENABLE_STKSearchUtil:BOOL=OFF                       \
  -D Trilinos_ENABLE_STKTopology:BOOL=ON                          \
  -D Trilinos_ENABLE_STKTransfer:BOOL=ON                          \
  -D Trilinos_ENABLE_STKUnit_tests:BOOL=OFF                       \
  -D Trilinos_ENABLE_STKUnit_test_utils:BOOL=OFF                  \
  -D Trilinos_ENABLE_STKUtil:BOOL=ON    
jewatkins commented 1 year ago

I see the same issue on Chrysalis when I use Intel and Intel-MPI (instead of OpenMPI).

Here is a gist with a shell script and yaml file that can be used to reproduce the issue: https://gist.github.com/xylar/4fa8c2a7a54335068bedbb38a64d584c

Can this be used to reproduce on any machine? I remember there was some issue with intel + spack a while back but that may have been resolved. Part of the issue is that everyone has different Trilinos configure scripts. It would be nice to have an up-to-date master one that we can all contribute to and have all our nightly testing conform to.

We are currently testing up to intel 19 and it looks like trilinos is doing the same? https://testing.sandia.gov/cdash/index.php?subproject=STK&project=Trilinos

We do have intel 20.2.254 on blake, we could try to update our testing there. I do recall seeing some issues with oneapi but I think they were mostly tpl issues.

xylar commented 1 year ago

@jewatkins, we should have a discussion about this. This recipe is specific to Chrysalis. Similar recipes are needed for other supported machines. As I presented on Tuesday, we have a python package mache that handles generating these build scripts and yaml files (as well as snippets that can be part of activation scripts) to handle different machines. But there's not an easy way to make this particular type of recipe machine independent and indeed the point from my perspective is to mimic E3SM, which is decidedly not machine independent.

xylar commented 1 year ago

I can make similar recipes for Compy, Cori or Anvil. We don't currently support Intel compilers on any other machine for the software, compass that I'm trying to build Albany for.

xylar commented 1 year ago

@ikalash and @jewatkins, I could potentially try to add one of the machines you test with to mache and compass so you could test on your machine the same way we build on ours. That would get us a lot closer to testing the workflow we ultimately want to have work without interruptions. Is there a machine at Sandia that you test on that's also supported by E3SM? See the following file for all machines that E3SM runs on: https://github.com/E3SM-Project/E3SM/blob/master/cime_config/machines/config_machines.xml

bartgol commented 1 year ago

snl-blake would be great, but I suspect that machine is not really maintained, since it was added by Micheal Deakin, who left SNL ~4yy ago, but is still listed as maintainer...

jewatkins commented 1 year ago

Okay, we can discuss more later if this isn't urgent. To me it looks like a compiler error, possibly due to the intel version which is not tested in Trilinos/Albany. I was thinking the easiest thing would be to try to update one of our internal builds to use the intel version that you're using and see if the compiler error comes up.

There's much we can do with a compiler error. We first have to identify the problem code. Then we'll either have tell the code owner to modify it (or we modify it and we have a special Trilinos version) or send a reproducer to the vendor and hope the issue is fixed in another release (and gets over to the target machines). But maybe it's as simple as turning something off like Luca suggests.

Skybridge is probably the closest thing to use for Sandia internal since we build e3sm and trilinos/albany there.

jewatkins commented 1 year ago

snl-blake would be great, but I suspect that machine is not really maintained, since it was added by Micheal Deakin, who left SNL ~4yy ago, but is still listed as maintainer...

Oh I did not see blake. It would be nice if we could revive that.

xylar commented 1 year ago

Well, keep me posted, I'm happy to help support whatever is practical.

ikalash commented 1 year ago

I suspect this is due to the compiler being so new. The newest intel compiler we are using in the Albany nightlies are 19.0.5. I could try building the code on one of our intel machines with the intel compiler that is there just as a sanity check and to test this theory. I'd try that on blake, as suggested above. Another thing to try would be to try building Trilinos the usual cmake way using the newer intel 20 compiler, but that's probably harder.

ikalash commented 1 year ago

Also, I have added this topic to the agenda for our 11/22 Albany meeting, just FYI.

ikalash commented 1 year ago

So I have an update on this - my apologies for the delay. I tried building albany using spack with intel on my workstation mockba using a sems module for the intel compiler, and the build completed. I used the following intel module there: module load sems-intel/2021.3. It therefore does not seem that intel is a fundamental problem for the spack build, newer versions of intel. I can try with some other versions of intel just as a sanity check. If that goes well, I would suggest someone try to build Trilinos from scratch using cmake on the problematic machines to see if the same issue is encountered.

ikalash commented 1 year ago

Looking in more detail at Xylar's original error, I think this might be a compiler bug: https://community.intel.com/t5/Intel-C-Compiler/Compiler-Error-quot-Segmentation-violation-signal-raised-quot/td-p/1075456 (this forum discussion is about a diff. version of the intel compiler, but same idea).

jewatkins commented 1 year ago

@ikalash Just curious and probably not related to this issue but did you build your own mpi with sems-intel/2021.3 or did you use sems-openmpi/4.0.5? I was having issues with sems-openmpi/4.0.5 on cee machines.

xylar commented 1 year ago

@ikalash, okay, I had also seen indications that it might be a compiler bug. This is a tricky situation because in the long run we really won't be able to choose our compilers, we will need to use the E3SM ones. For now, that leaves me no choice but to only support Albany with Gnu on our HPC machines until E3SM chooses to update the Intel compiler modules.

bartgol commented 1 year ago

@xylar Since Albany is pre-installed, isn't it possible to use a different compiler? I know this might open a can of worms, but so long as we don't use drastically different compilers, shouldn't E3SM+intelX be able to link against Albany+intelY?

Note: if you want to avoid using different compilers altogether, I completely understand; just checking though.

jewatkins commented 1 year ago

As mentioned before, other options might include trying to turn off the problem code or modify Trilinos and have a "special" Trilinos for the spack build. I suppose the latter could get messy if the spack build is updated automatically with Trilinos develop.

ikalash commented 1 year ago

How often to the compilers for E3SM get updated? I would advocate trying to get it switched if it is known to have compiler bugs, but I could see that this is easier said than done.

Trying what @bartgol and @jewatkins if we're stuck with the compiler is a good idea.

xylar commented 1 year ago

They are updated pretty rarely. And at some HPC centers we have more control over that process than at others. I'll bring this up with the infrastructure group and see what they suggest.

ikalash commented 1 year ago

@ikalash Just curious and probably not related to this issue but did you build your own mpi with sems-intel/2021.3 or did you use sems-openmpi/4.0.5? I was having issues with sems-openmpi/4.0.5 on cee machines.

@jewatkins : I had spack build openmpi-4.1.4, rather than using the sems openmpi. Did not try with the sems openmpi.

ikalash commented 1 year ago

They are updated pretty rarely. And at some HPC centers we have more control over that process than at others. I'll bring this up with the infrastructure group and see what they suggest.

Sounds good, please keep us posted on what they say.

xylar commented 1 year ago

We agreed that this is a lower priority. We will try building trilinos-for-albany with Intel on Perlmutter if and when that compiler becomes available. In the meantime, we will focus on GPU support and other, higher priorities.

ikalash commented 1 year ago

Sounds good, @xylar , thanks for the update.