atif4461 / PR_DNS_base

0 stars 0 forks source link

Petsc with CUDA backend #3

Open atif4461 opened 7 months ago

atif4461 commented 7 months ago

Tried to install release 3.20.4 with --with-cuda flag, dumps core when using -mat_type aijcusparse -vec_type cuda. Stumbled across comment at https://petsc.org/release/overview/gpu_roadmap/ to use main and build from source for GPUs.

atif4461 commented 7 months ago

src/ksp/ksp/tutorials$ ./ex12 Norm of error 2.10144e-06 iterations 14

src/ksp/ksp/tutorials$ ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is not GPU-aware. For better performance, please use a GPU-aware MPI. [0]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To not see the message again, add the option to your .petscrc, OR add it to the env var PETSC_OPTIONS. [0]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, you may need jsrun --smpiargs=-gpu. [0]PETSC ERROR: For Open MPI, you need to configure it --with-cuda (https://www.open-mpi.org/faq/?category=buildcuda) [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 (http://mvapich.cse.ohio-state.edu/userguide/gdr/) [0]PETSC ERROR: For Cray-MPICH, you need to set MPICH_GPU_SUPPORT_ENABLED=1 (man mpi to see manual of cray-mpich)

High errors: need to build PETSc with CUDA aware MPI

atif4461 commented 7 months ago

Installed cuda aware openmpi

./configure --prefix=/work/atif/packages/openmpi-4.1.1-cudaaware --with-cuda=/usr/local/cuda/ --with-cuda-libdir=/usr/local/cuda/lib64/

checked Perlmutter example; works

built petsc-3.20.4 with cuda aware openmpi

src/ksp/ksp/tutorials$ ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630

Still high errors

High errors appear on CPU with -n 200 -m 200 as well ./ex12 -n 200 -m 200 Norm of error 0.574264 iterations 1630

atif4461 commented 7 months ago

Grid size 128x128x128

Single MPI atif1 NavierStokes solver : 66.11 atif2 Particle Propagate + Vapor temperature : 28.95 atif3 Particle Propagate : 12.07 atif4 FT Add Set TimeStep : 0.00 runtime = 95.05,

Single MPI + A30 GPU atif1 NavierStokes solver : 71.14 atif2 Particle Propagate + Vapor temperature : 37.81 atif3 Particle Propagate : 13.07 atif4 FT Add Set TimeStep : 0.00 runtime = 108.95

Need GPU sharing

atif4461 commented 7 months ago

Perlmutter issues:

CC is required for CUDA aware MPI, but is unable to compile PETSc (mpicc works for PETSc)

atif4461 commented 7 months ago

CC,cc,ftn work with PETSc along with the following modules

1) craype-x86-milan 3) craype-network-ofi 5) PrgEnv-gnu/8.5.0 7) craype/2.7.30 9) cpe/23.12 11) gpu/1.0 13) gcc/12.2.0 (c) 2) libfabric/1.15.2.0 4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta 6) cray-dsmml/0.2.2 8) perftools-base/23.12.0 10) craype-accel-nvidia80 12) cudatoolkit/11.7 (g) 14) cray-mpich/8.1.25 (mpi)

nvcc still problematic; submitted ticket to nersc

atif4461 commented 7 months ago

Managed to build with cpe/23.03

Currently Loaded Modules: 1) craype-x86-milan 4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta 7) PrgEnv-gnu/8.3.3 (cpe) 10) cray-mpich/8.1.25 (mpi) 13) perftools-base/23.03.0 (dev) 2) libfabric/1.15.2.0 5) craype-accel-nvidia80 8) cray-dsmml/0.2.2 11) craype/2.7.20 (c) 14) cudatoolkit/11.7 (g) 3) craype-network-ofi 6) gpu/1.0 9) cray-libsci/23.02.1.1 (math) 12) gcc/11.2.0 (c) 15) cpe/23.03 (cpe)

and

./configure --CC=cc --CXX=CC --FC=ftn --prefix=/global/homes/a/atif/packages/petsc-3.20.4-cudaaware --with-debugging=no COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --download-make=1 --download-hdf5=1 --download-hypre=1 --with-shared-libraries --with-static=1 --with-cuda -CUDAC=nvcc

Works now!

./ex12 Norm of error 2.10144e-06 iterations 14

srun -C gpu -n 1 -G 1 --exclusive ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630

atif4461 commented 7 months ago

Grid size 128x128x128

Single MPI atif1 NavierStokes solver : 53.99 atif2 Particle Propagate + Vapor temperature : 22.74 atif3 Particle Propagate : 12.18 atif4 FT Add Set TimeStep : 0.00 runtime = 76.73

Single MPI + A100 atif1 NavierStokes solver : 58.19 atif2 Particle Propagate + Vapor temperature : 27.35 atif3 Particle Propagate : 12.17 atif4 FT Add Set TimeStep : 0.00 runtime = 85.53

atif4461 commented 5 months ago

===================== 64 bit ===========================

(base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630

real 0m6.984s user 0m0.803s sys 0m5.685s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none Norm of error 0.574264 iterations 1630

real 0m2.185s user 0m0.748s sys 0m1.340s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 186.34 iterations 10000

real 0m8.202s user 0m5.931s sys 0m2.185s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none Norm of error 186.34 iterations 10000

real 2m29.281s user 2m27.735s sys 0m1.484s

===================== 32 bit ===========================

(base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630

real 0m3.027s user 0m0.732s sys 0m2.155s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none Norm of error 0.574264 iterations 1630

real 0m2.200s user 0m0.637s sys 0m1.456s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 186.34 iterations 10000

real 0m8.782s user 0m6.558s sys 0m2.167s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none Norm of error 186.34 iterations 10000

real 2m6.245s user 2m4.787s sys 0m1.396s

atif4461 commented 5 months ago

(GTL DEBUG: 2) cuIpcOpenMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 307 (GTL DEBUG: 1) cuIpcOpenMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 307 (GTL DEBUG: 1) cuIpcOpenMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 307

https://github.com/E3SM-Project/E3SM/issues/4834

time srun -n 4 --gpus-per-task=1 ./ex12 -n 200 -m 200 -vec_type cuda -mat_type aijcusparse