GEOS-DEV / GEOS

GEOS Simulation Framework
GNU Lesser General Public License v2.1
210 stars 84 forks source link

How to run geosx on GPU? #3084

Closed liushilongpku closed 5 months ago

liushilongpku commented 5 months ago

Hello everyone ! I am I'm new to geosx. I have successfully complied GEOS with CUDA, but I don’t know how to use GPU to help me to solve my question. Now I am running GEOS with these commands:

mpirun -np 8 geosx -i input.xml -x 2 -y 2 -z 2

Or

geosx -i input.xml

Although geosx can run with these commads and my GPU is highly occupied, I still don’t think that GPU is working correctly because all of them will lead this message:

**************************************************
*                 WARNING!!! 
*
*GEOS has GPU support enabled, but not HYPRE! 
**************************************************

What is the correct command for running geos with CUDA? Or any config files are need? Thank you for any reply!

jhuang2601 commented 5 months ago

@liushilongpku In your config file, add following lines to enable hypre support on GPU

set(ENABLE_HYPRE ON CACHE BOOL "")
set(ENABLE_HYPRE_DEVICE "CUDA" CACHE BOOL "")
liushilongpku commented 5 months ago

@jhuang2601 Thank you for your reply! I have modified my config file and the warning has been removed. But I meet a new error like below when I run SPE10 to test the new program.

Time: 0.00e+00 s, dt: 1000 s, Cycle: 0

    Attempt:  0, ConfigurationIter:  0, NewtonIter:  0
        ( Rflow ) = ( 9.72e+01 )        ( R ) = ( 9.72e+01 )
CUDA ERROR (code = 101, invalid device ordinal) at memory.c:191
CUDA ERROR (code = 101, invalid device ordinal) at memory.c:191
***** ERROR
***** LOCATION: /home/lsl/3code/GEOS/src/coreComponents/linearAlgebra/interfaces/hypre/HypreUtils.hpp:157
***** Controlling expression (should be false): err != cudaSuccess
***** Rank 0: Previous CUDA errors found: before HYPRE_IJMatrixAddToValues2 (invalid device ordinal at /home/lsl/3code/GEOS/src/coreComponents/linearAlgebra/interfaces/hypre/HypreMatrix.cpp:189)

** StackTrace of 9 frames **
Frame 0: geos::SolverBase::solveNonlinearSystem(double const&, double const&, int, geos::DomainPartition&)
Frame 1: geos::SolverBase::nonlinearImplicitStep(double const&, double const&, int, geos::DomainPartition&)
Frame 2: geos::SolverBase::solverStep(double const&, double const&, int, geos::DomainPartition&)
Frame 3: geos::SolverBase::execute(double, double, int, int, double, geos::DomainPartition&)
Frame 4: geos::EventBase::execute(double, double, int, int, double, geos::DomainPartition&)
Frame 5: geos::EventManager::run(geos::DomainPartition&)
Frame 6: geos::GeosxState::run()
Frame 7: main
Frame 8: __libc_start_main
Frame 9: _start
=====

My config file is:

# file: wsl-unbuntu.cmake

# detect host and name the configuration file
site_name(HOST_NAME)
set(CONFIG_NAME "wsl-ubuntu" CACHE PATH "")
message("CONFIG_NAME = ${CONFIG_NAME}")

# set paths to C, C++, and Fortran compilers. Note that while GEOS does not contain any Fortran code,
# some of the third-party libraries do contain Fortran code. Thus a Fortran compiler must be specified.
set(CMAKE_C_COMPILER "/usr/bin/gcc" CACHE PATH "")
set(CMAKE_CXX_COMPILER "/usr/bin/g++" CACHE PATH "")
set(CMAKE_Fortran_COMPILER "/usr/bin/gfortran" CACHE PATH "")
set(ENABLE_FORTRAN OFF CACHE BOOL "" FORCE)

set(ENABLE_MPI ON CACHE BOOL "")
set(MPI_C_COMPILER "/usr/bin/mpicc" CACHE PATH "")
set(MPI_CXX_COMPILER "/usr/bin/mpicxx" CACHE PATH "")
set(MPI_Fortran_COMPILER "/usr/bin/mpifort" CACHE PATH "")
set(MPIEXEC "/usr/bin/mpirun" CACHE PATH "")

set(ENABLE_CUDA ON CACHE BOOL "")

set(CUDA_TOOLKIT_ROOT_DIR "/usr/local/cuda" CACHE PATH "")
set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc" CACHE PATH "")
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE PATH "")

set(CUDA_SEPARABLE_COMPILIATION ON CACHE BOOL "")
set(CUDA_ARCH "sm_89" CACHE STRING "")
set(CMAKE_CUDA_FLAGS "-restrict -arch ${CUDA_ARCH} --extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_LINK_FLAGS "-Xlinker –rpath –Xlinker /usr/bin/mpicxx" CACHE STRING """")

set(ENABLE_HYPRE ON CACHE BOOL "")
set(ENABLE_HYPRE_DEVICE "CUDA" CACHE BOOL "")

set(ENABLE_OPENMP OFF CACHE BOOL "" FORCE)

# enable PVTPackage
set(ENABLE_PVTPackage ON CACHE BOOL "" FORCE)

# enable tests
set(ENABLE_GTEST_DEATH_TESTS ON CACHE BOOL "" FORCE )

# define the path to your compiled installation directory
set(GEOSX_TPL_DIR "/home/lsl/3code/thirdPartyLibs/install-wsl-ubuntu-release" CACHE PATH "")
# let GEOS define some third party libraries information for you
include(${CMAKE_CURRENT_LIST_DIR}/tpls.cmake)

Does the config file need any modify? Sorry for taking your time and thank you for your reply again!

jhuang2601 commented 5 months ago

@liushilongpku Which CUDA version are you using? any error for the compilation of TPLs and GEOS?

liushilongpku commented 5 months ago

@jhuang2601 Here are the platform information: CUDA:V11.8.89 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

GCC:9.4.0 gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Platform: Ubuntu 20.04( but it is Windows Subsystem for Linux, WSL)

I have NOT see any error message while I compile TPLS and GEOS. (or maybe just didn't see them).

But I have seen an error which is same with mine at repository HYPRE issue. CUDA ERROR (code = 101, invalid device ordinal) when using BoomerAMG with PETSc on GPU with Unified Memory #957

What intresting is: Both of our platforms are WSL. So I think this error maybe caused by the differences GPU ordinary in WSL and Windows. The GPU ordinary in WSL is 0(from nvidia-smi), and it maybe 1 in Windows.( I'm not very sure about this.) Unfortunately, I have no pure linux device with GPU and I can't compile GEOS in pure linux. I try to find the locations of GPU selecting in GEOS, but I still have no idea where they are. Do you konw where are these locations?

What is more, I am not familiar with debug in linux, so I can only use ''cout <<'' and remake to debug😂. Could you tell me some debug softwares? (Open source, free or student version will be nice. hhh...)

All of these will be very helpful! Thank you!

klevzoff commented 5 months ago

Hi @liushilongpku,

If you don't mind, can you post the output of nvidia-smi?

Also, can you try the following in you WSL console (assuming you have python and pip installed):

  1. pip3 install cuda-python
  2. python -c "from cuda import cudart; print(cudart.cudaGetDevice())"

If you see something like

(<cudaError_t.cudaSuccess: 0>, N)

(where N is some number, likely 0 or 1), try setting

export CUDA_VISIBLE_DEVICES=N

and running GEOS again.

liushilongpku commented 5 months ago

Hi ,@klevzoff Thank you for your reply! Here are my nvidia-smi output:

Mon Apr 22 12:58:50 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.10              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   41C    P8              4W /  125W |     481MiB /   8188MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I got this information by running the command mentioned above from cuda-pyhton(but not python-cuda):

(<cudaError_t.cudaSuccess: 0>, 0)

But I meet the same error as above again after I applied

lsl@LAPTOP-E3C2OCEP:~/datafile/SPE10$ export CUDA_VISIBLE_DEVICES=0
lsl@LAPTOP-E3C2OCEP:~/datafile/SPE10$ /home/lsl/3code/GEOS/install-wsl-ubuntu-release/bin/geos -i deadOilSpe10Layers84_85_benchmark2.xml
Num ranks: 1
GEOS version: 0.2.0 (develop, sha1: 4696a656f)
  - c++ compiler: gcc 9.4.0
  - CUDA compiler version: 11.8
  - MPI version: Open MPI v4.0.3, package: Debian OpenMPI, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020
  -......
  - hypre version: v2.30.0-44-geab5f9f7f (master)
  - trilinos version: 13.4.1
  - Python3 version: 3.8.10
Started at 2024-04-22 05:16:53.176449453
......

------------------- TIMESTEP START -------------------
    - Time:       00h00m00s (0 s)
    - Delta Time: 00h16m40s (1000 s)
    - Cycle:      0
------------------------------------------------------

Time: 0.00e+00 s, dt: 1000 s, Cycle: 0

    Attempt:  0, ConfigurationIter:  0, NewtonIter:  0
        ( Rflow ) = ( 9.72e+01 )        ( R ) = ( 9.72e+01 )
CUDA ERROR (code = 101, invalid device ordinal) at memory.c:191
CUDA ERROR (code = 101, invalid device ordinal) at memory.c:191
***** ERROR
***** LOCATION: /home/lsl/3code/GEOS/src/coreComponents/linearAlgebra/interfaces/hypre/HypreUtils.hpp:157
***** Controlling expression (should be false): err != cudaSuccess
***** Rank 0: Previous CUDA errors found: before HYPRE_IJMatrixAddToValues2 (invalid device ordinal at /home/lsl/3code/GEOS/src/coreComponents/linearAlgebra/interfaces/hypre/HypreMatrix.cpp:189)

** StackTrace of 9 frames **
Frame 0: geos::SolverBase::solveNonlinearSystem(double const&, double const&, int, geos::DomainPartition&)
Frame 1: geos::SolverBase::nonlinearImplicitStep(double const&, double const&, int, geos::DomainPartition&)
Frame 2: geos::SolverBase::solverStep(double const&, double const&, int, geos::DomainPartition&)
Frame 3: geos::SolverBase::execute(double, double, int, int, double, geos::DomainPartition&)
Frame 4: geos::EventBase::execute(double, double, int, int, double, geos::DomainPartition&)
Frame 5: geos::EventManager::run(geos::DomainPartition&)
Frame 6: geos::GeosxState::run()
Frame 7: main
Frame 8: __libc_start_main
Frame 9: _start
=====

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

And if I set "CUDA_VISIBLE_DEVICES" to "1" ,geosx will just not start like:

lsl@LAPTOP-E3C2OCEP:~/datafile/SPE10$ export CUDA_VISIBLE_DEVICES=1
lsl@LAPTOP-E3C2OCEP:~/datafile/SPE10$ /home/lsl/3code/GEOS/install-wsl-ubuntu-release/bin/geos -i deadOilSpe10Layers84_85_benchmark2.xml
Num ranks: 1
CUDA ERROR (code = 100, no CUDA-capable device is detected) at general.c:255
CUDA ERROR (code = 100, no CUDA-capable device is detected) at general.c:201
CUDA ERROR (code = 100, no CUDA-capable device is detected) at general.c:85
CUDA ERROR (code = 100, no CUDA-capable device is detected) at device_utils.c:404
CUBLAS ERROR (code = 1, 0) at device_utils.c:2811

It looks like GEOS has selected GPU successfully.(At least GEOS can not run without CUDA_VISIBLE_DEVICES=0.) Any other information of my device you need will be ok to post. Thank you!

klevzoff commented 5 months ago

cuda-python(but not python-cuda)

Apologies, shouldn't have typed it from memory :) Corrected my previous message for history.

It seems hypre is allocating its device memory as unified, the error originates here. @victorapm is it viable to use hypre without UVM, maybe with a limited subset of solvers/preconditioners? Basically, UVM is supported on WSL with limitations; one of them is the lack of concurrent CPU/GPU access, which is an explicit requirement of cudaMemPrefetchAsync called by hypre.

victorapm commented 5 months ago

Good point about unified memory on WSL!

Yes, it's viable to use hypre without UVM in GEOS. The main reason we still keep it is to allow for over-subscription of GPU RAM. We can probably turn it off after the sparse matrix refactoring work is finished.

@liushilongpku To compile hypre without UVM support, remove this line:

https://github.com/GEOS-DEV/thirdPartyLibs/blob/adffaa8e675b6122620e70d389186055c10f201d/CMakeLists.txt#L806

liushilongpku commented 5 months ago

@victorapm Thank you for your reply and sorry for late response. Yes, I can run geosx without warnning now by using your suggesstion! But it still work not correctlly. I use a simple case to test gpu. Here is the output of case ''singlePhaseFlow'':

path/to/my/GPU/version/geosx -i 3D_10x10x10_compressible_smoke.xml
Time: 0.00e+00 s, dt: 20 s, Cycle: 0

    Attempt:  0, ConfigurationIter:  0, NewtonIter:  0
        ( Rflow ) = ( 3.99e+00 )        ( R ) = ( 3.99e+00 )
Received signal 11: Segmentation fault

** StackTrace of 16 frames **
Frame 0: /lib/x86_64-linux-gnu/libc.so.6
Frame 1: hypre_BoomerAMGCoarsenRuge
Frame 2: hypre_BoomerAMGCoarsenHMIS
Frame 3: hypre_BoomerAMGSetup
Frame 4: geos::HyprePreconditioner::setup(geos::HypreMatrix const&)
Frame 5: geos::HypreSolver::setup(geos::HypreMatrix const&)
Frame 6: geos::SolverBase::solveLinearSystem(geos::DofManager const&, geos::HypreMatrix&, geos::HypreVector&, geos::HypreVector&)
Frame 7: geos::SolverBase::solveNonlinearSystem(double const&, double const&, int, geos::DomainPartition&)
Frame 8: geos::SolverBase::nonlinearImplicitStep(double const&, double const&, int, geos::DomainPartition&)
Frame 9: geos::SolverBase::solverStep(double const&, double const&, int, geos::DomainPartition&)
Frame 10: geos::SolverBase::execute(double, double, int, int, double, geos::DomainPartition&)
Frame 11: geos::EventBase::execute(double, double, int, int, double, geos::DomainPartition&)
Frame 12: geos::EventManager::run(geos::DomainPartition&)
Frame 13: geos::GeosxState::run()
Frame 14: main
Frame 15: __libc_start_main
Frame 16: _start
=====

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

Does there still have any quesstion in my input?

What is more, in my another case with about 4e5 elements. GEOSX can run correctlly in CPU version. But GEOSX with CUDA will just stuke after EquilibrationStep for about 5 mins and be "killled":

path/to/my/GPU/version/geosx -i 3D_10x10x10_compressible_smoke.xml
Time: -1.00e+11 s, dt: 100000000000 s, Cycle: 0

Task `multiphasePoroelasticityPreEquilibrationStep`: at time -100000000000s, physics solver `multiphasePoroelasticity` is set to perform stress initialization during the next time step(s)
Task `multiphasePoroelasticityPreEquilibrationStep`: at time -100000000000s, physics solver `linearElasticity` is resetting total displacement and velocity to zero
    Attempt:  0, ConfigurationIter:  0, NewtonIter:  0
        ( Rflow ) = ( 0.00e+00 )        ( Rsolid ) = ( 1.03e+00 )        ( R ) = ( 1.03e+00 )

Killed

I think that my computer just out of memory. Will using GPU computing consume more memory than CPU only? Sorry for taking your time!

victorapm commented 5 months ago

@liushilongpku Can you update GEOS to the latest version, recompile, and try again?

liushilongpku commented 5 months ago

@victorapm Thank you ! Yes, I have recompile the latest version today and tested GEOS in both compositionalMultiphaseFlow and singlephaseFlow . GEOS which GPU will make the same mistake like what we talked about some days ago in case compositionalMultiphaseFlow:

Received signal 11: Segmentation fault

But it can complete the singlephase simulation case with longer time than GEOS with CPU. CPU version simulation information is:

SinglePhaseFlow, number of time steps: 250
SinglePhaseFlow, number of successful nonlinear iterations: 253
SinglePhaseFlow, number of successful linear iterations: 1770
SinglePhaseFlow, number of time step cuts: 0
SinglePhaseFlow, number of discarded nonlinear iterations: 0
SinglePhaseFlow, number of discarded linear iterations: 0
SinglePhaseFlow: apply solution time = 0.014295243 s (min), 0.014295243 s (max)
SinglePhaseFlow: assemble time = 0.33603981 s (min), 0.33603981 s (max)
SinglePhaseFlow: convergence check time = 0.013203698 s (min), 0.013203698 s (max)
SinglePhaseFlow: linear solver create time = 0.028971492 s (min), 0.028971492 s (max)
SinglePhaseFlow: linear solver setup time = 0.200326402 s (min), 0.200326402 s (max)
SinglePhaseFlow: linear solver solve time = 0.163649687 s (min), 0.163649687 s (max)
SinglePhaseFlow: linear solver total time = 0.406319466 s (min), 0.406319466 s (max)
SinglePhaseFlow: update state time = 0.003153027 s (min), 0.003153027 s (max)
Umpire            HOST sum across ranks:    2.6 MB
Umpire            HOST         rank max:    2.6 MB
Finished at 2024-04-28 16:46:50.665935643
total time            00h00m01s (1.807385773 s)
initialization time   00h00m00s (0.059207183 s)
run time              00h00m01s (1.407594863 s)

And the GPU version is:

SinglePhaseFlow, number of time steps: 253
SinglePhaseFlow, number of successful nonlinear iterations: 293
SinglePhaseFlow, number of successful linear iterations: 4455
SinglePhaseFlow, number of time step cuts: 3
SinglePhaseFlow, number of discarded nonlinear iterations: 24
SinglePhaseFlow, number of discarded linear iterations: 360
SinglePhaseFlow: apply solution time = 0.102443484 s (min), 0.102443484 s (max)
SinglePhaseFlow: assemble time = 2.016661495 s (min), 2.016661495 s (max)
SinglePhaseFlow: convergence check time = 0.046693102 s (min), 0.046693102 s (max)
SinglePhaseFlow: line search time = 0.197157786 s (min), 0.197157786 s (max)
SinglePhaseFlow: linear solver create time = 0.694030657 s (min), 0.694030657 s (max)
SinglePhaseFlow: linear solver setup time = 8.845069469 s (min), 8.845069469 s (max)
SinglePhaseFlow: linear solver solve time = 12.397167029 s (min), 12.397167029 s (max)
SinglePhaseFlow: linear solver total time = 21.980840024 s (min), 21.980840024 s (max)
SinglePhaseFlow: update state time = 0.074835178 s (min), 0.074835178 s (max)
Umpire          DEVICE sum across ranks:    4.0 GB
Umpire          DEVICE         rank max:    4.0 GB
Umpire       DEVICE::0 sum across ranks:    4.0 GB
Umpire       DEVICE::0         rank max:    4.0 GB
Umpire            HOST sum across ranks:    2.6 MB
Umpire            HOST         rank max:    2.6 MB
Umpire HYPRE_DEVICE_POOL sum across ranks:  792.0 KB
Umpire HYPRE_DEVICE_POOL         rank max:  792.0 KB
Finished at 2024-04-28 16:49:47.183222974
total time            00h00m26s (26.276397033 s)
initialization time   00h00m00s (0.051273563 s)
run time              00h00m25s (25.266459622 s)
victorapm commented 5 months ago

Ok! Which options did you pass to the geos executable in the failing run?

liushilongpku commented 5 months ago

Just like this in simple test case: path/to/gpu/version/geosx -i input.xml Or in myself case: mpirun -np 16 path/to/gpu/version/geosx -i input.xml -x 4 -y 4 -z 1 Are these right? Or some other options which i neglect?

victorapm commented 5 months ago

Can you share input.xml?

liushilongpku commented 5 months ago

The inputfile of SinglePhase case is 3D_10x10x10_compressible_smoke.xml in geos inputfiles. The inputfile of compositionalMultiphaseFlow case is deadOilSpe10Layers84_85_smoke_2d.xml in geos inputfiles. My case is this : gpucanrun.tar.gz Didn't upload the mesh file because the vtk file is too large (300MB + ) to upload. What's strange is that sometime I restart my wsl it can work correctlly. But now it can't again...

victorapm commented 5 months ago

Thank you! Can you try to run https://github.com/GEOS-DEV/GEOS/blob/develop/inputFiles/compositionalMultiphaseFlow/benchmarks/SPE10/deadOilSpe10Layers84_85_benchmark.xml ?

mpirun -np 1 ${GEOS_INSTALL_PATH}/bin/geosx -i deadOilSpe10Layers84_85_benchmark.xml

liushilongpku commented 5 months ago

Thanke for your reply ! Here are a series of tests:

Cleaning up events
compflow, number of time steps: 31
compflow, number of successful nonlinear iterations: 265
compflow, number of successful linear iterations: 1882
compflow, number of time step cuts: 2
compflow, number of discarded nonlinear iterations: 80
compflow, number of discarded linear iterations: 652
compflow: apply solution time = 0.153492478 s (min), 0.153492478 s (max)
compflow: assemble time = 2.277028998 s (min), 2.277028998 s (max)
compflow: convergence check time = 0.031993215 s (min), 0.031993215 s (max)
compflow: linear solver create time = 1.999864288 s (min), 1.999864288 s (max)
compflow: linear solver setup time = 20.091948007 s (min), 20.091948007 s (max)
compflow: linear solver solve time = 21.129472269 s (min), 21.129472269 s (max)
compflow: linear solver total time = 43.288631964 s (min), 43.288631964 s (max)
compflow: update state time = 0.186117447 s (min), 0.186117447 s (max)
Rank 0: Writing out restart file at ./deadOilSpe10Layers84_85_benchmark_restart_000000028/rank_0000000.hdf5
Umpire          DEVICE sum across ranks:    4.1 GB
Umpire          DEVICE         rank max:    4.1 GB
Umpire       DEVICE::0 sum across ranks:    4.1 GB
Umpire       DEVICE::0         rank max:    4.1 GB
Umpire            HOST sum across ranks:  116.2 MB
Umpire            HOST         rank max:  116.2 MB
Umpire HYPRE_DEVICE_POOL sum across ranks:  134.1 MB
Umpire HYPRE_DEVICE_POOL         rank max:  134.1 MB
Finished at 2024-04-29 15:00:29.879149778
total time            00h00m48s (48.89631231 s)
initialization time   00h00m00s (0.319529504 s)
run time              00h00m47s (47.642932087 s)
Cleaning up events
compflow, number of time steps: 25
compflow, number of successful nonlinear iterations: 167
compflow, number of successful linear iterations: 1196
compflow, number of time step cuts: 2
compflow, number of discarded nonlinear iterations: 80
compflow, number of discarded linear iterations: 648
compflow: apply solution time = 0.12086866 s (min), 0.12086866 s (max)
compflow: assemble time = 1.542823542 s (min), 1.542823542 s (max)
compflow: convergence check time = 0.023766221 s (min), 0.023766221 s (max)
compflow: linear solver create time = 1.521159926 s (min), 1.521159926 s (max)
compflow: linear solver setup time = 14.549213123 s (min), 14.549213123 s (max)
compflow: linear solver solve time = 15.694809132 s (min), 15.694809132 s (max)
compflow: linear solver total time = 31.813868682 s (min), 31.813868682 s (max)
compflow: update state time = 0.155275194 s (min), 0.155275194 s (max)
Rank 0: Writing out restart file at ./deadOilSpe10Layers84_85_benchmark_restart_000000023/rank_0000000.hdf5
Umpire          DEVICE sum across ranks:    4.1 GB
Umpire          DEVICE         rank max:    4.1 GB
Umpire       DEVICE::0 sum across ranks:    4.1 GB
Umpire       DEVICE::0         rank max:    4.1 GB
Umpire            HOST sum across ranks:  116.2 MB
Umpire            HOST         rank max:  116.2 MB
Umpire HYPRE_DEVICE_POOL sum across ranks:  134.1 MB
Umpire HYPRE_DEVICE_POOL         rank max:  134.1 MB
Finished at 2024-04-29 15:01:50.184873434
total time            00h00m36s (36.531293392 s)
initialization time   00h00m00s (0.273306483 s)
run time              00h00m35s (35.334866816 s)
Cleaning up events
compflow, number of time steps: 21
compflow, number of successful nonlinear iterations: 50
compflow, number of successful linear iterations: 625
compflow, number of time step cuts: 0
compflow, number of discarded nonlinear iterations: 0
compflow, number of discarded linear iterations: 0
compflow: apply solution time = 0.022745318 s (min), 0.022745318 s (max)
compflow: assemble time = 1.625245664 s (min), 1.625245664 s (max)
compflow: convergence check time = 0.011401115 s (min), 0.011401115 s (max)
compflow: linear solver create time = 0.684940571 s (min), 0.684940571 s (max)
compflow: linear solver setup time = 1.832012468 s (min), 1.832012468 s (max)
compflow: linear solver solve time = 2.904478914 s (min), 2.904478914 s (max)
compflow: linear solver total time = 5.450090655 s (min), 5.450090655 s (max)
compflow: update state time = 0.182978064 s (min), 0.182978064 s (max)
Rank 0: Writing out restart file at ./deadOilSpe10Layers84_85_benchmark_restart_000000021/rank_0000000.hdf5
Umpire            HOST sum across ranks:  115.8 MB
Umpire            HOST         rank max:  115.8 MB
Finished at 2024-04-29 15:07:09.800248488
total time            00h00m09s (9.266199834 s)
initialization time   00h00m00s (0.181194127 s)
run time              00h00m08s (8.788017799 s)
Cleaning up events
compflow, number of time steps: 21
compflow, number of successful nonlinear iterations: 50
compflow, number of successful linear iterations: 625
compflow, number of time step cuts: 0
compflow, number of discarded nonlinear iterations: 0
compflow, number of discarded linear iterations: 0
compflow: apply solution time = 0.0258884 s (min), 0.0258884 s (max)
compflow: assemble time = 1.730302272 s (min), 1.730302272 s (max)
compflow: convergence check time = 0.012108509 s (min), 0.012108509 s (max)
compflow: linear solver create time = 0.745360817 s (min), 0.745360817 s (max)
compflow: linear solver setup time = 1.978417941 s (min), 1.978417941 s (max)
compflow: linear solver solve time = 3.177347019 s (min), 3.177347019 s (max)
compflow: linear solver total time = 5.929548222 s (min), 5.929548222 s (max)
compflow: update state time = 0.185074602 s (min), 0.185074602 s (max)
Rank 0: Writing out restart file at ./deadOilSpe10Layers84_85_benchmark_restart_000000021/rank_0000000.hdf5
Umpire            HOST sum across ranks:  115.8 MB
Umpire            HOST         rank max:  115.8 MB
Finished at 2024-04-29 15:04:10.021012250
total time            00h00m10s (10.094345755 s)
initialization time   00h00m00s (0.182275674 s)
run time              00h00m09s (9.576663454 s)
victorapm commented 5 months ago

Thanks! The smoke test cases such as deadOilSpe10Layers84_85_smoke_2d.xml use a direct method as the linear solver and currently rely on UVM support to work, that's why it failed in your runs. However, the benchmark ones rely on iterative solvers and should work on GPUs.

Please look at deadOilSpe10Layers84_85_iterative.xml to understand how to setup an iterative solver for your problem.

victorapm commented 5 months ago

Closing this issue as it has been resolved. Thank you!

liushilongpku commented 5 months ago

Didn‘t reply because i'm trying to find a new device to test. GEOS with gpu on wsl looks like not very stable and memory consumable . The linear solver solve time will fluctuate between 3s to 40s under the same test file.(the cpu time is about 2.8s stably.) And when i run GEOS with mpirun -np 1, it will consume about 4 G GPU memory. If i use mpirun -np 2, my GPU on laptop will be stuck .(maybe because out of GPU memory which is 8G available totally.) I'm not sure are these problems releated to UVM support. So I'm trying to find a new pure linux with gpu device to test them. But it will need much more time. Thank you!