CUDA Out of Memory with GCMC

Hi, I am trying to run Grand Canonical Monte Carlo (GCMC) for water adsorption in silica using MACE. The issue is that I have a CUDA out of memory error after few thousands steps (~4000 with one gpu). I also tried using more than one gpu, but I still have cuda out of memory. I am using A100 GPUs 64GB.

Input

This is the input file that I am using. Note that I specify bond and angle styles as harmonic but then the coefficient is 0 in order to be able to run the MC with the water molecule, so at the end the energies that I get are only mace predicted energies.

units metal
boundary p p p
atom_style full
neighbor 1.0 bin
neigh_modify delay 1
pair_style mace no_domain_decomposition

atom_modify   map yes
newton        on

bond_style harmonic
angle_style harmonic

read_data ../part1/1_SiOwithwater.data
molecule h2omol ../H2O.mol

lattice sc 3
create_atoms 0 box mol h2omol 45585

lattice none 1
group SiO type 1 2
group H2O type 3 4

pair_coeff * * ./MACE_MPtrj_2022.9.model-lammps.pt Si O H O
bond_coeff * 0.0 0.0
angle_coeff * 0.0 0.0

delete_atoms overlap 2 H2O SiO mol yes

# Next 4 lines to count the number of water molecules
variable oxygen atom "type==3"
group oxygen dynamic all var oxygen
variable nO equal count(oxygen)
fix myat1 all ave/time 100 10 1000 v_nO file numbermolecule.dat

## the GCMC step
variable tfac equal 5.0/3.0
variable xlo equal xlo+0.1
variable xhi equal xhi-0.1
variable ylo equal ylo+0.1
variable yhi equal yhi-0.1
variable zlo equal zlo+0.1
variable zhi equal zhi-0.1
region system block ${xlo} ${xhi} ${ylo} ${yhi} ${zlo} ${zhi}
fix fgcmc H2O gcmc 100 100 0 0 65899 300 -0.5 0.1 &
    mol h2omol tfac_insert ${tfac} group H2O &
    full_energy pressure 10000 region system

run 45000
write_data SiOwithwater.data
write_dump all atom dump.lammpstrj

Running environment

I used the following specifications to build lammps-mace:

cmake \
    -D CMAKE_BUILD_TYPE=Release \
    -D CMAKE_INSTALL_PREFIX=$(pwd) \
    -D CMAKE_CXX_STANDARD=17 \
    -D CMAKE_CXX_STANDARD_REQUIRED=ON \
    -D BUILD_MPI=ON \
    -D BUILD_OMP=ON \
    -D BUILD_SHARED_LIBS=ON \
    -D PKG_KOKKOS=ON \
    -D Kokkos_ENABLE_CUDA=ON \
    -D CMAKE_CXX_COMPILER=$(pwd)/../lib/kokkos/bin/nvcc_wrapper \
    -D Kokkos_ARCH_AMDAVX=ON \
    -D Kokkos_ARCH_AMPERE100=ON \
    -D CMAKE_PREFIX_PATH=$(pwd)/../../libtorch-gpu \
    -D PKG_ML-MACE=ON \
    -D CAFFE2_USE_CUDNN=True \
    -D PKG_RIGID=ON \
    -D PKG_MC=ON \
    -D PKG_MOLECULE=ON \
    ../cmake

When I run, I load these modules:

module load gcc/12.2.0
module load gsl/2.7.1--gcc--12.2.0
module load openmpi/4.1.6--gcc--12.2.0
module load fftw/3.3.10--openmpi--4.1.6--gcc--12.2.0
module load openblas/0.3.24--gcc--12.2.0
module load cuda/12.1
module load intel-oneapi-mkl/2023.2.0

Error message

This is the error message that I get

RuntimeError: CUDA out of memory. Tried to allocate 6.46 GiB. GPU 0 has a total capacity of 63.42 GiB of which 6.42 GiB is free. Including non-PyTorch memory, this process has 57.00 GiB memory in use. Of the allocated memory 55.10 GiB is allocated by PyTorch, and 302.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I am not sure why I have cuda out of memory error although the size of my system is small, I would appreciate any insights, thanks!

ACEsuit / mace