How to use LAMMPS to train a larger system?

stargolike commented 1 week ago

Discussed in https://github.com/ACEsuit/mace/discussions/475

^{Originally posted by **stargolike** June 20, 2024} I used a system of 200 atoms for training, and selected hidden irreps: '64x0e+64x1o'. But when I want to use LAMMPS for MD simulation, I can only run a system with 1k atoms. If I use a larger system, there will be a memory out error, such as ``` RuntimeError: CUDA out of memory. Tried to allocate 2.36 GiB. GPU 0 has a total capacity of 47.45 GiB of which 1.43 GiB is free. Including non-PyTorch memory, this process has 46.01 GiB memory in use. Of the allocated memory 37.87 GiB is allocated by PyTorch, and 7.86 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ``` The graphics card I am currently using is the RTX8000, which can be used for training. I would like to ask what this memory is related to and how can I have a larger system for MD

stargolike commented 1 week ago

And I also found a problem that for the same system and model, ase can run, while lammps is not available.

ilyes319 commented 1 week ago

If you want to fit a larger number of atoms on a GPU, you should try to decrease your cutoff. What is your cutoff size?

stargolike commented 1 week ago

If you want to fit a larger number of atoms on a GPU, you should try to decrease your cutoff. What is your cutoff size?

dear ilyes,I am using the default cutoff. and it's my lammps inputfile

#------------------------------Basic settings--------------------------
units         metal
atom_style    atomic
atom_modify   map yes
newton        on
read_data     1920atom_7.5m
pair_style mace
pair_coeff * * MACE_model_run-123.model-lammps.pt H O Cl Zn

dump 1 all custom 100 toEquil.lammpstrj id type x y z vx vy vz
thermo 1
run 1000

i understand your meaning,i should change pair_style mace and add the cutoff.

ilyes319 commented 1 week ago

I meant during training, you should try to use a smaller cutoff.

stargolike commented 1 week ago

I meant during training, you should try to use a smaller cutoff.

sorry,i can't understand. it's my config. and it's not some parameters about cutoff

name: MACE_model
config_type_weights: {"Default":1.0}
model: "MACE"
hidden_irreps: '64x0e + 64x1o'
r_max: 4.0
train_file: train.xyz
test_file: test.xyz
valid_file: val.xyz
batch_size: 10
energy_key: "energy"
forces_key: "forces"
ema: yes
ema_decay: 0.99 
amsgrad: yes
restart_latest: yes
max_num_epochs: 100
device: cuda 
loss: "huber"

ACEsuit / mace

How to use LAMMPS to train a larger system? #476

Discussed in https://github.com/ACEsuit/mace/discussions/475