Closed rrsettgast closed 3 years ago
I got my account, and was able to access Summit. I'll start building when I get back from vacation next week. Two questions: Do we need to use the HYPRE+CUDA ability for the LBL problem? Do we need to use any implicit solver?
Not building with HYPRE+CUDA, and using the system BLAS/LAPACK instead of ESSL would definitely make things quicker.
I got my account, and was able to access Summit. I'll start building when I get back from vacation next week. Two questions: Do we need to use the HYPRE+CUDA ability for the LBL problem?
Yes. But we can start without HYPRE+CUDA and just execute the linear solver on host.
Do we need to use any implicit solver?
Yes. We defining do need to run quasi-static mechanics.
Not building with HYPRE+CUDA, and using the system BLAS/LAPACK instead of ESSL would definitely make things quicker.
Maybe a two step process. Just get it running with Trilinos on host...then once we get the build to @tjligocki to play with, we can get HYPRE+CUDA enabled.
Successfully built LvArray with GCC 10.2.0 and CUDA 11.2.0. I tried using GCC 8.1.1 and CUDA 10.1.243 but it broke. GCC 7.4.0 worked but we had problems in GEOSX with it. To get GCC 10.2.0 to work I had to use CUDA 11.2.0.
What was the error with gcc8? @tjligocki what compiler combo are you using?
A CMake error when trying to verify nvcc. Something about __float128
not being defined even thought the -mno-float128
flag was being passed. Something like that, I didn't spend too much time thinking about it once 7 and 10 worked.
I think was a problem when we built on ascent. Did you look at that hostconfig? https://github.com/GEOSX/LvArray/blob/develop/host-configs/ascent-gcc%408.1.1.cmake @klevzoff @wrtobin Any suggestions?
I mean I'm just going to keep moving forward with GCC 10.2 and CUDA 11.2, I should find out soon enough if it doesn't work.
Yeah at the time just adding -std=c++11 -Xcompiler -mno-float128
to RAJA seemed to have fixed the issue: https://github.com/GEOSX/thirdPartyLibs/pull/104/files . Also discussion in https://github.com/LLNL/blt/issues/341 . I don't know where things went from there, but CUDA 11 shouldn't have this problem.
I got the unit tests passing with GCC 10.2, CUDA 11.2 and Hypre (but not Hypre-CUDA). I'll try the integrated tests tomorrow, my hopes aren't high.
Now I'm stuck trying to install h5py
(required for the integrated tests restart check). It doesn't come pre-installed and I'm having trouble pip installing it from a virtual environment. This was the approach I took on Cori and it worked fine so something funny is up. I may have to try and build it with Spack, currently even when building pygeosx we don't build h5py, but that should work.
h5py builds with Spack but then it segfaults when importing. It on Lassen I also get an import error, but not a segfault.
Ok I figured it out. One of the python modules has h5py
installed, but it's Python 3. So I had to make some changes to let geosxats
run with Python 2 but let restartcheck.py
run with Python 3.
When run on a single node the following tests fail in addition to the tests that are currently failing on Lassen
Sneddon_conforming01
Sneddon_conforming04
KGD_ZeroToughness_01
KGD_ZeroViscosity_01
co2_hybrid_1d_01
co2_hybrid_1d_02
co2_hybrid_1d_03
co2_flux_3d_01
Sneddon_01
fractureMatrixFlow_Embedded2d_01
When I grab enough nodes to run all the tests the only new one that fails is co2_flux_3d_08
.
Create a hostconfig for ORNL/Summit using either XL/cuda or gcc/cuda.