Open atif4461 opened 7 months ago
src/ksp/ksp/tutorials$ ./ex12 Norm of error 2.10144e-06 iterations 14
src/ksp/ksp/tutorials$ ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is not GPU-aware. For better performance, please use a GPU-aware MPI. [0]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To not see the message again, add the option to your .petscrc, OR add it to the env var PETSC_OPTIONS. [0]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, you may need jsrun --smpiargs=-gpu. [0]PETSC ERROR: For Open MPI, you need to configure it --with-cuda (https://www.open-mpi.org/faq/?category=buildcuda) [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 (http://mvapich.cse.ohio-state.edu/userguide/gdr/) [0]PETSC ERROR: For Cray-MPICH, you need to set MPICH_GPU_SUPPORT_ENABLED=1 (man mpi to see manual of cray-mpich)
High errors: need to build PETSc with CUDA aware MPI
Installed cuda aware openmpi
./configure --prefix=/work/atif/packages/openmpi-4.1.1-cudaaware --with-cuda=/usr/local/cuda/ --with-cuda-libdir=/usr/local/cuda/lib64/
checked Perlmutter example; works
built petsc-3.20.4 with cuda aware openmpi
src/ksp/ksp/tutorials$ ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630
Still high errors
High errors appear on CPU with -n 200 -m 200 as well ./ex12 -n 200 -m 200 Norm of error 0.574264 iterations 1630
Grid size 128x128x128
Single MPI atif1 NavierStokes solver : 66.11 atif2 Particle Propagate + Vapor temperature : 28.95 atif3 Particle Propagate : 12.07 atif4 FT Add Set TimeStep : 0.00 runtime = 95.05,
Single MPI + A30 GPU atif1 NavierStokes solver : 71.14 atif2 Particle Propagate + Vapor temperature : 37.81 atif3 Particle Propagate : 13.07 atif4 FT Add Set TimeStep : 0.00 runtime = 108.95
Need GPU sharing
Perlmutter issues:
CC is required for CUDA aware MPI, but is unable to compile PETSc (mpicc works for PETSc)
CC,cc,ftn work with PETSc along with the following modules
1) craype-x86-milan 3) craype-network-ofi 5) PrgEnv-gnu/8.5.0 7) craype/2.7.30 9) cpe/23.12 11) gpu/1.0 13) gcc/12.2.0 (c) 2) libfabric/1.15.2.0 4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta 6) cray-dsmml/0.2.2 8) perftools-base/23.12.0 10) craype-accel-nvidia80 12) cudatoolkit/11.7 (g) 14) cray-mpich/8.1.25 (mpi)
nvcc still problematic; submitted ticket to nersc
Managed to build with cpe/23.03
Currently Loaded Modules: 1) craype-x86-milan 4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta 7) PrgEnv-gnu/8.3.3 (cpe) 10) cray-mpich/8.1.25 (mpi) 13) perftools-base/23.03.0 (dev) 2) libfabric/1.15.2.0 5) craype-accel-nvidia80 8) cray-dsmml/0.2.2 11) craype/2.7.20 (c) 14) cudatoolkit/11.7 (g) 3) craype-network-ofi 6) gpu/1.0 9) cray-libsci/23.02.1.1 (math) 12) gcc/11.2.0 (c) 15) cpe/23.03 (cpe)
and
./configure --CC=cc --CXX=CC --FC=ftn --prefix=/global/homes/a/atif/packages/petsc-3.20.4-cudaaware --with-debugging=no COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --download-make=1 --download-hdf5=1 --download-hypre=1 --with-shared-libraries --with-static=1 --with-cuda -CUDAC=nvcc
Works now!
./ex12 Norm of error 2.10144e-06 iterations 14
srun -C gpu -n 1 -G 1 --exclusive ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630
Grid size 128x128x128
Single MPI atif1 NavierStokes solver : 53.99 atif2 Particle Propagate + Vapor temperature : 22.74 atif3 Particle Propagate : 12.18 atif4 FT Add Set TimeStep : 0.00 runtime = 76.73
Single MPI + A100 atif1 NavierStokes solver : 58.19 atif2 Particle Propagate + Vapor temperature : 27.35 atif3 Particle Propagate : 12.17 atif4 FT Add Set TimeStep : 0.00 runtime = 85.53
===================== 64 bit ===========================
(base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630
real 0m6.984s user 0m0.803s sys 0m5.685s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none Norm of error 0.574264 iterations 1630
real 0m2.185s user 0m0.748s sys 0m1.340s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 186.34 iterations 10000
real 0m8.202s user 0m5.931s sys 0m2.185s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware-64bit/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none Norm of error 186.34 iterations 10000
real 2m29.281s user 2m27.735s sys 0m1.484s
===================== 32 bit ===========================
(base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 0.574264 iterations 1630
real 0m3.027s user 0m0.732s sys 0m2.155s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 200 -m 200 -pc_type none Norm of error 0.574264 iterations 1630
real 0m2.200s user 0m0.637s sys 0m1.456s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none -vec_type cuda -mat_type aijcusparse Norm of error 186.34 iterations 10000
real 0m8.782s user 0m6.558s sys 0m2.167s (base) atif@nid001353:~/packages/petsc-3.20.4-cudaaware/share/petsc/examples/src/ksp/ksp/tutorials> time ./ex12 -n 1000 -m 1000 -pc_type none Norm of error 186.34 iterations 10000
real 2m6.245s user 2m4.787s sys 0m1.396s
(GTL DEBUG: 2) cuIpcOpenMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 307 (GTL DEBUG: 1) cuIpcOpenMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 307 (GTL DEBUG: 1) cuIpcOpenMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 307
https://github.com/E3SM-Project/E3SM/issues/4834
time srun -n 4 --gpus-per-task=1 ./ex12 -n 200 -m 200 -vec_type cuda -mat_type aijcusparse
Tried to install release 3.20.4 with
--with-cuda
flag, dumps core when using-mat_type aijcusparse -vec_type cuda
. Stumbled across comment at https://petsc.org/release/overview/gpu_roadmap/ to use main and build from source for GPUs.