Open hwsheng opened 4 months ago
Can you paste your input file?
Thanks for your attention. Here is the input of my lammps-mace simulation, which runs well in a single GPU execution.
# Test of MACE potential for C system
units metal
boundary p p p
atom_style atomic
atom_modify map yes
newton on
read_data C.dat
mass 1 12.011
pair_style mace no_domain_decomposition
pair_coeff * * ../carbon_swa.model-lammps.pt C
velocity all create 10000 4928459 rot yes dist gaussian
fix 1 all npt temp 6300 300 0.2 iso 10000 10000 0.5
thermo 100
timestep 0.002
dump dump all custom 10000 dump.dat id type xu yu zu
run 600000
unfix 1
fix 1 all npt temp 300 300 0.2 iso 0 0 0.5
run 100000
The no_domain_decomposition
only works on a single GPU, so you need
pair_style mace
instead.
This isn't very well documented, sorry. Please note that, right now, a single-GPU no_domain_decomposition
simulation will almost certainly be faster than a multi-GPU simulation. I don't recommend using multi-GPU unless you absolutely need it (e.g., for memory). We are working on this.
Thanks for the heads-up. Indeed, I was trying to resolve the out-of-memory issue encountered in the single-gpu simulation when increasing the number of atoms in the simulation system.
Now, for a test run using two GPUs,
After using
pair_style mace
it turns out that I got an out-of-memory error RuntimeError: CUDA out of memory. Tried to allocate 7.39 GiB (GPU 1; 79.15 GiB total capacity; 65.36 GiB already allocated; 5.00 GiB free; 72.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
This error was not shown in a single GPU simulation (same system size, 4086 atoms).
Guess I have to stick to a small system size for the simulation for now?
Thanks in advance.
For single species, on our A100 (80GB memory), I'd normally expect to reach system sizes of 5000-10000 before seeing memory problems, depending on how expressive the model is (L=0, L=1, L=2, etc). So you may be able to reach larger systems on a single GPU by reducing your model size.
It's also possible, but not guaranteed, that increasing to four GPUs (say) would be enough. But this wouldn't be my first choice if you can avoid it.
ok. Many thanks for your advice. I will try that.
I'm encountering difficulties with running a multi-GPU simulation in LAMMPS using the MACE model. In a preliminary test using two GPUs, I executed the simulation with the following command: mpirun -np 2 ~/lammps-mace-gpu/lammps/build-kokkos-cuda/lmp -in lmp.in -k on g 2 -sf kk. However, I ran into an error stating cudaFree(arg_alloc_ptr) error(cudaErrorAssert): device-side assert triggered.
Would you have any advice on how to address this problem? Thank you in advance.