SCOREC / pumi-pic

support libraries for unstructured mesh particle in cell simulations on GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
36 stars 15 forks source link

Memory Issue while using SCS PS in Perlmutter #101

Closed dhyan1272 closed 10 months ago

dhyan1272 commented 1 year ago

Running out of memory with as little as 5M particles per GPU on Perlmutter with SCS

Kokkos: 4.1.00 Omega_h : a90fe9ca55610168877e52aefc3a2c14db9c614e Cabana: fa9e58a5f02906704c7d4ee16a975d65dc29aa63 PUMIPIC: master branch

cwsmith commented 1 year ago

Hi @dhyan1272. Can you please provide the following?

dhyan1272 commented 1 year ago

The environment file

module load PrgEnv-gnu
module load cray-hdf5/1.12.2.7
module load cray-netcdf/4.9.0.3
module load cudatoolkit/11.7
module load craype-accel-nvidia80
module load cmake/3.22.0
export root=/global/cfs/cdirs/m4227/GITRm
export kk=$root/build-kokkos/install
export oh=$root/build-omegah/install
export engpar=$root/build-engpar/install
export cab=$root/build-cabana/install
export pp=$root/build-pumipic/install
export gitrm=$root/build-gitrm/install
export cuda=$CRAY_CUDATOOLKIT_DIR
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH
export CMAKE_PREFIX_PATH=$kk:$oh:$engpar:$cab:$pp:$CMAKE_PREFIX_PATH
dhyan1272 commented 1 year ago

GITRm is run by

#!/bin/bash
#SBATCH -A m4227
#SBATCH -C gpu
#SBATCH -q regular
#SBATCH -t 0:05:00
#SBATCH -n 1
#SBATCH --ntasks-per-node=1
#SBATCH -c 32
#SBATCH --gpus-per-task=1
#SBATCH --job-name=ITER_He

module load PrgEnv-gnu
module load cudatoolkit/11.7
module load craype-accel-nvidia80
export SLURM_CPU_BIND="cores"
export MPICH_ABORT_ON_ERROR=1
ulimit -c unlimited
export dp=/pscratch/sd/n/nathd/ITER_data

srun hostname
scontrol show jobid ${SLURM_JOB_ID}
srun /global/cfs/cdirs/m4227/GITRm/build-gitrm/GITRm \
  ${dp}/NEW_MESH/OmegaH.osh \
  ${dp}/gitrm_1.ptn \
  ${dp}/profiles_solps_final_Tim.nc \
  ${dp}/lu.nc \
  ${dp}/ADAS_4.nc \
  ${dp}/ftridynSelf.nc \
  ${dp}/bField_r.nc \
  ${dp}/profiles_solps_final_Tim.nc \
  Iter_He \
  -
Angelyr commented 10 months ago

This has been resolved