Closed WardLT closed 7 months ago
I guess you might need to recompile LAMMPS with MPS, it can be switched on with "-DCUDA_MPS_SUPPORT=yes"
cmake ../cmake -DCMAKE_BUILD_TYPE=release \
-DCMAKE_CUDA_COMPILER=nvcc \
-DCMAKE_C_COMPILER=nvc++ \
-DCMAKE_CXX_COMPILER=nvc++ \
-DCMAKE_CXX_STANDARD=14 \
-DLAMMPS_MEMALIGN=64 \
-DLAMMPS_SIZES=smallsmall \
-DPKG_MISC=on \
-DPKG_MOFFF=on \
-DFFT=KISS \
-DPKG_RIGID=on \
-DPKG_MOLECULE=on \
-DPKG_EXTRA-MOLECULE=on \
-DPKG_EXTRA-FIX=on \
-DPKG_KSPACE=on \
-DPKG_MANYBODY=on \
-DPKG_GRANULAR=on \
-DPKG_GPU=on \
-DGPU_API=cuda \
-DGPU_PREC=mixed \
-DGPU_ARCH=sm_80 \
-DGPU_DEBUG=no \
-DCUDA_MPS_SUPPORT=yes \
-DBUILD_OMP=no \
-DBUILD_MPI=no \
-DCUDA_NVCC_FLAGS="-std=c++14 -allow-unsupported-compiler -Xcompiler" \
-DCMAKE_CXX_FLAGS="-std=c++14 -DCUDA_PROXY"
Our LAMMPS tasks are using only a quarter of the GPU. We could probably improve throughput that by using MPS to run >1 trajectory per GPU