Open EarlFan opened 8 months ago
It's hard to provide detailed instructions for GPU use as the details can vary from system to system. But if you want to run on a system with Nvidia GPUs using cuda and your system is set up properly, all you should need to do is compile as normal, but with USE_CUDA = TRUE
(and USE_MPI = TRUE
assuming you also want MPI support) in your GNUmakefile
. I'd recommend trying this for the PMF case using the pmf-lidryer-cvode.inp input file.
When running on GPUs, certain simulation input parameters may benefit from being re-optimized for performance. In particular, you may want larger values for amr.blocking_factor
and amr.max_grid_size
, and you may want to look at different options for cvode.solve_type
. Every problem is different so it's usually good to do a little experimentation.
It's useful to know that certain things need to be done for certain sites, and AMReX has some supported sites here https://github.com/AMReX-Codes/amrex/tree/development/Tools/GNUMake/sites . The machine query logic is here https://github.com/AMReX-Codes/amrex/blob/development/Tools/GNUMake/Make.machines .
Dear all,
Thank you for your assistance!
I try to compile PeleC with nvcc on the WSL but have encountered some challenges, particularly with the Sundial package. Currently, I am able to run PeleC on CPUs without issues, but I am eager to explore the capabilities of GPU acceleration.
If it is OK, I would like to keep this issue open to share my future experiences regarding the use of PeleC with GPU computing.
Regards, Fan E
Yeah that's fine to leave this issue open and add more detail on any issues you have running on GPUs, which we can then try to address.
It's hard to provide detailed instructions for GPU use as the details can vary from system to system. But if you want to run on a system with Nvidia GPUs using cuda and your system is set up properly, all you should need to do is compile as normal, but with
USE_CUDA = TRUE
(andUSE_MPI = TRUE
assuming you also want MPI support) in yourGNUmakefile
. I'd recommend trying this for the PMF case using the pmf-lidryer-cvode.inp input file.When running on GPUs, certain simulation input parameters may benefit from being re-optimized for performance. In particular, you may want larger values for
amr.blocking_factor
andamr.max_grid_size
, and you may want to look at different options forcvode.solve_type
. Every problem is different so it's usually good to do a little experimentation.
Hello! I have a doubt to clarify. When I first tested the code in CPU parallel mode, by running the basic PMF testcase I had not set MPI=TRUE in the example.inp file. But still the mpirun -np command worked out, to run the PeleC executable. Did I miss out anything?
mpirun
will run any application with multiple instances, for example try mpirun -np 8 echo "hello"
.
Without MPI enabled in PeleC it will run the same application with np
instances but they won't communicate to solve a single problem. Without MPI enabled mpirun
will run the same problem in multiple instances with no benefit of concurrency.
Note that when you compile for MPI, you should have USE_MPI = TRUE
in your GNUmakefile, and MPI
should appear in the name of the PeleC executable that gets generated. No changes are needed in the input files to run with MPI. But if the executable doesn't have MPI
in the name, you generated a serial executable and it will run independent instances as mentioned by @jrood-nrel.
Thanks for your clarifications on this! @jrood-nrel @baperry2 .
I am trying to get Pele to work with GPUs on kestrel and any instructions on the relevant modules to be loaded will be greatly appreciated. So far I've tried using PrgEnv-nvhpc
and PrgEnv-nvidia
along with openmpi
but I keep getting the following error after I compile TPL
/scratch/ramac106/PeleC/Submodules/PelePhysics/Submodules/amrex/Src/Base/AMReX_ccse-mpi.H:14:10: fatal error: mpi.h: No such file or directory
#include <mpi.h>
^~~~~~~
compilation terminated.
For Kestrel GPUs, you can use the modules specified here, which should also work for PeleC: https://erf.readthedocs.io/en/latest/GettingStarted.html#kestrel-nrel
Let us know if there are any issues, it's been a bit since I tested PeleC on Kestrel GPUs and they've been periodically reshuffling the modules as they get the GPUs on line
thank you @baperry2. this is really helpful. Will let you know how it goes
I followed the steps outlined ERF's website (with the latest branches of PeleC and Submodules). I seem to run into the following error:
In file included from /scratch/ramac106/PeleC_latest/PeleC/Submodules/PelePhysics/Submodules/amrex/Src/Extern/SUNDIALS/AMReX_SUNMemory.cpp:1:
/scratch/ramac106/PeleC_latest/PeleC/Submodules/PelePhysics/Submodules/amrex/Src/Extern/SUNDIALS/AMReX_Sundials_Core.H:7:10: fatal error: sundials/sundials_config.h: No such file or directory
7 | #include <sundials/sundials_config.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
I did try re-making TPL after loading the modules suggested, but still get this error...
Make sure you've done git submodule update --recursive
before make TPLrealclean && make TPL
, and double check that your the sundials commit you are using is 2abd63bd6
.
However, it does seem that there may be another issue here, as when I try it's getting past the step you are seeing but failing to generate the executable after linking.
The following procedure looks to work but fails to produce an executable towards the end (i.e. goes all the way upto AMReX_BuildInfo
but the linking looks like an issue for some reason)
make TPLrealclean; make TPL USE_CUDA=TRUE
make realclean; make -j COMP=gnu USE_CUDA=TRUE
using MPI+CUDA btw i.e. USE_MPI=TRUE
and USE_CUDA=TRUE
. COMP=nvhpc
results in sundials issues again...
As I mentioned, the setup of the GPU partition of Kestrel has been frustratingly unstable. It appears they have again changed things in a way that makes the prior instructions no longer functional.
You should be able to use the following module setup:
module purge;
module load PrgEnv-gnu/8.5.0;
module load cuda/12.3;
module load craype-x86-milan;
And then compile with:
make TPLrealclean; make TPL COMP=gnu USE_CUDA=TRUE USE_MPI=TRUE
make realclean; make -j COMP=gnu USE_CUDA=TRUE USE_MPI=TRUE
I happened to be looking at this as well and I used this:
git clone --recursive git@github.com:AMReX-Combustion/PeleC.git && cd PeleC/Exec/RegTests/PMF && module purge && module load PrgEnv-gnu/8.5.0 && module load craype-x86-trento && module load cray-libsci && module load cmake && module load cuda && module load cray-mpich/8.1.28 && make realclean && nice make USE_MPI=TRUE USE_CUDA=TRUE COMP=gnu -j24 TPLrealclean && nice make USE_MPI=TRUE USE_CUDA=TRUE COMP=gnu -j24 TPL && nice make USE_MPI=TRUE USE_CUDA=TRUE COMP=gnu -j24
thanks a lot @jrood-nrel @baperry2 I am able to get a linked executable for the PMF case successfully with CUDA. My specific case though still faces the issue. I guess its something to do with the way the PMF functions and data-structures have been defined and will align everything with the way its done in the present case folder
Dear all,
Hi!
I want to build and run PeleC using GPU, however, I am not able to find any tutorial on installations relevant to GPU or the CUDA enviroment. Can anyone provide some tutorial? Any help will be appreciated!
Thanks!
Regards, Fan E