E3SM-Project / spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
https://spack.io
Other
1 stars 2 forks source link

problems building trilinos-for-albany on perlmutter #5

Closed xylar closed 1 year ago

xylar commented 1 year ago

@ikalash and @mperego,

Hitting you on GitHub as well as slack.

I'm finally building compass on Perlmutter! I ran into trouble building the spack trilinos-for-albany package and was hoping you could help me out. The error I get isn't very helpful (at least to me):

     792    Configuring individual enabled Trilinos packages ...
     793    
     794    Processing enabled top-level package: Gtest (Libs)
     795    Processing enabled top-level package: Kokkos (Core, Containers, Alg
            orithms)
     796    -- Setting default Kokkos CXX standard to 14
  >> 797    CMake Error at packages/kokkos/CMakeLists.txt:123 (MESSAGE):
     798      Kokkos did not configure correctly and failed to validate compile
            r.  The
     799      most likely cause is linkage errors during CMake compiler validat
            ion.
     800      Please consult the CMake error log shown below for the exact erro
            r during
     801      compiler validation
     802    
     803    

Full build and logs are here:

/pscratch/sd/x/xylar/spack_temp/spack-stage/spack-stage-trilinos-for-albany-develop-wz37djhjbx3mjaqxbl37ora2f6ujtovu/spack-src

Let me know if I messed up the permissions.

On Slack, I attached the files I use to build (not allowed on GitHub). You should be able to run them yourselves if you point to appropriate directories (i.e. on your pscratch, not mine).

This isn't particularly urgent but would be very nice to solve before Cori disappears in January.

ikalash commented 1 year ago

I'll take a look at this. I had a few questions @xylar about what you tried:

xylar commented 1 year ago

@ikalash, were you able to find the attached bash and yaml files that I posted in the slack channel? That should answer all these questions. I use (and very much need to use) the system-built compilers, mpi, hdf5, netcdf, pnetcdf, etc.

ikalash commented 1 year ago

Yes I found them. Putting them here for easier reference. Thanks for clarifying. build_dev_compass_1_2_0-alpha_2_gnu_mpich_albany.bash.txt dev_compass_1_2_0-alpha_2_gnu_mpich_albany.yaml.txt

mperego commented 1 year ago

in case it's useful here you can find the modules and scripts we use to build MALI on Perlmutter https://github.com/sandialabs/Albany/tree/master/doc/LandIce/perlmutter

ikalash commented 1 year ago

spack-build-out.txt Hmmm, so the regular nightly spack build I have is failing to build Albany. The build log is attached. There is an MPI error:

/projects/albany/nightlySpackBuild/spack/opt/spack/linux-rhel7-ivybridge/gcc-9.2.0/trilinos-for-albany-develop-hajhfstqrqodd4jdqghcn52umc6552sw/lib/libgtest.so.13.5 -lgfortran 
libalbany_ut_main.so: undefined reference to `ompi_mpi_cxx_op_intercept'
libalbany_ut_main.so: undefined reference to `MPI::Win::Free()'
libalbany_ut_main.so: undefined reference to `MPI::Datatype::Free()'
libalbany_ut_main.so: undefined reference to `MPI::Comm::Comm()'
collect2: error: ld returned 1 exit status
make[2]: *** [tests/unit/string_utils] Error 1

Please comment if you have any thoughts. I'll see if this shows up in any of the other nightlies. It is not related to the Perlmutter error, which is a configure error, but we should resolve it.

ikalash commented 1 year ago

@xylar : could you also please share your packages.py file, so I can compare it with the one I am making on perlmutter?

ikalash commented 1 year ago

@xylar : so for me, Trilinos compiled. I used this compiler module load gcc/11.2.0 and not any other module TPLs (I will try that next). I am attaching my compilers.yaml file. I built using the following command spack --insecure install --dirty --keep-stage albany%gcc@11.1.0+mpas .

Now, unfortunately my build of Albany was not successful due to that export_albany.in error that appeared/disappeared for no apparent reason. I am not sure what to make of it. The error log is attached. I could try again tomorrow from scratch to see if it goes away. spack-build-out.txt

cwd: /pscratch/sd/i/ikalash/spackAlbany/spack-stage-albany-develop-ae6poivqdwvmpt7ksamtaciq4nqwoijf/spack-build-ae6poiv/dummy
Traceback (most recent call last):
  File "/pscratch/sd/i/ikalash/spackAlbany/spack-stage-albany-develop-ae6poivqdwvmpt7ksamtaciq4nqwoijf/spack-src/cmake/CreateExportAlbany", line 185, in <module>
    _main_func(__doc__)
  File "/pscratch/sd/i/ikalash/spackAlbany/spack-stage-albany-develop-ae6poivqdwvmpt7ksamtaciq4nqwoijf/spack-src/cmake/CreateExportAlbany", line 178, in _main_func
    run (bin_dir,install_lib_dir,generator)
  File "/pscratch/sd/i/ikalash/spackAlbany/spack-stage-albany-develop-ae6poivqdwvmpt7ksamtaciq4nqwoijf/spack-src/cmake/CreateExportAlbany", line 149, in run
    raise ValueError
ValueError
make[2]: *** [src/CMakeFiles/create_export_albany.dir/build.make:76: export_albany.in] Error 1
make[2]: Leaving directory '/pscratch/sd/i/ikalash/spackAlbany/spack-stage-albany-develop-ae6poivqdwvmpt7ksamtaciq4nqwoijf/spack-build-ae6poiv'
make[1]: *** [CMakeFiles/Makefile2:1316: src/CMakeFiles/create_export_albany.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 99%] Linking CXX executable dummy
ikalash commented 1 year ago

I hacked the build to get it to compile and ran the tests. All tests passed. So I think we can get a working build on perlmutter with spack modulo some of these strange hiccups...

xylar commented 1 year ago

@ikalash, that sounds promising.

could you also please share your packages.py file, so I can compare it with the one I am making on perlmutter?

I never create one. I rely on the environment yaml file entirely. If this is something that spack generates automatically, I can dig it up.

I used this compiler module load gcc/11.2.0 and not any other module TPLs (I will try that next).

I need to use the same compilers and libraries as E3SM unless there is a strong reason to break that connection. Otherwise, our testing doesn't reflect what we end up seeing when we run E3SM with the same components. I realize this is less of an issue for MALI than other components right now but it's also a very big pain for me to maintain a separate compiler/mpi/library stack for MALI than for MPASO.

I am attaching my compilers.yaml file.

If that got attached, I missed it.

ikalash commented 1 year ago

Sorry about this. It is attached now. I don't know how to get spack to use pre-installed TPLs w/o an packages.yaml file. Perhaps we can discuss this at the Nov. 8 meeting.

compilers.yaml.txt

xylar commented 1 year ago

Thanks!

My understanding is that's one of the main points of having an environment.yaml file, that is lets you work with external modules, whereas you don't really have the level of control you need to make them useful any other way. But the truth is I don't ever do anything outside of an environment.

ikalash commented 1 year ago

So I created a packages.yaml file for perlmutter that reflects the packages we want to use on perlmutter for Albany (https://github.com/sandialabs/Albany/blob/master/doc/LandIce/perlmutter/pm_cpu_gnu_modules.sh), which is attached, and was able to build Albany with one caveat - there is again the error about export_albany.in:

wd: /pscratch/sd/i/ikalash/spackAlbany2/spack-stage-albany-develop-464dkohafi6xoko6ukjqoxeae7pygw5c/spack-build-464dkoh/dummy
Traceback (most recent call last):
  File "/pscratch/sd/i/ikalash/spackAlbany2/spack-stage-albany-develop-464dkohafi6xoko6ukjqoxeae7pygw5c/spack-src/cmake/CreateExportAlbany", line 185, in <module>
    _main_func(__doc__)
  File "/pscratch/sd/i/ikalash/spackAlbany2/spack-stage-albany-develop-464dkohafi6xoko6ukjqoxeae7pygw5c/spack-src/cmake/CreateExportAlbany", line 178, in _main_func
    run (bin_dir,install_lib_dir,generator)
  File "/pscratch/sd/i/ikalash/spackAlbany2/spack-stage-albany-develop-464dkohafi6xoko6ukjqoxeae7pygw5c/spack-src/cmake/CreateExportAlbany", line 149, in run
    raise ValueError
ValueError
make[2]: *** [src/CMakeFiles/create_export_albany.dir/build.make:76: export_albany.in] Error 1
make[2]: Leaving directory '/pscratch/sd/i/ikalash/spackAlbany2/spack-stage-albany-develop-464dkohafi6xoko6ukjqoxeae7pygw5c/spack-build-464dkoh'
make[1]: *** [CMakeFiles/Makefile2:1316: src/CMakeFiles/create_export_albany.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 99%] Linking CXX executable dummy

@bartgol I am afraid this is a real issue that is showing up in the spack build that needs to be debugged, as I am seeing it again on a number of platforms. I will open an issue and assign it to you. I'm happy to provide instructions for reproducing the problem, if you tell me a machine of choice (e.g., blake, cee, etc.). I'd first verify that the error shows up on that machine before handing things off to you.

@xylar , can you please try building on perlmutter with my compiler.yaml and packages.yaml file and see how far things get for you? Hopefully everything will build (except Albany at the very end due to the aforementioned error). I am using the following line to build:

spack --insecure install --dirty --keep-stage albany%gcc@11.1.0+mpas

packages.yaml.txt

xylar commented 1 year ago

@ikalash, thank you! I will try that out and let you know how it goes.

bartgol commented 1 year ago

@ikalash feel free to assign me. Unfortunately I have very few cycles in the short term, so I won't get to it this week (more likely later next week).

I have never used spack, so any instruction to reproduce would be greatly appreciated. Any OHPC machine is fine (blake, weaver, mappy). I don't know if I can log in onto the cee ones. I think I tried not long ago and it bounced me.

ikalash commented 1 year ago

FYI, I've checked in the files to the Albany repo, and they can be found here: https://github.com/sandialabs/Albany/tree/master/doc/LandIce/perlmutter/spack .

bartgol commented 1 year ago

Ok, thanks. I can hop on PM when I debug this, so no need to reproduce on another machine.

ikalash commented 1 year ago

@bartgol I will reproduce this on blake and send you instructions. Next week to look at this is fine. It'll make more sense after the spack tutorial anyway. Please stay tuned. The perlmutter stuff was more for Xylar and for future reference.

xylar commented 1 year ago

@ikalash, I don't usually build with a compilers.yaml and a packages.yaml so for the tutorial next week it might make sense for you to cover this approach. I'll still try to figure out how it's done in the meantime.

ikalash commented 1 year ago

@xylar , that's the plan! For now you can see the wiki page I created / updated: https://github.com/sandialabs/Albany/wiki/Building-Albany-using-SPACK

xylar commented 1 year ago

Perfect, thanks!

xylar commented 1 year ago

I added a missing module (libfabric) and now get the same error as in #6, so I'm going to close this issue in favor of that more informative one.