MDIL-SNU / SevenNet

SevenNet - a graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations.
https://pubs.acs.org/doi/10.1021/acs.jctc.4c00190
GNU General Public License v3.0
126 stars 15 forks source link

Support using OpenMPI to train model #92

Closed thangckt closed 2 weeks ago

thangckt commented 1 month ago

This PR will extremely helpful to someone just has CPU clusters

Usage:

mpirun -np $NSLOTS python $SEVENN_PATH/sevenn_train_mpi.py input.yaml --distributed --distributed_backend='mpi'

@YutackPark The file 'sevenn_train_mpi.py' can replace file main/sevenn.py. But I temporary put a separate file in root folder for the case you don't like it, then you can remove easily.

YutackPark commented 1 month ago

Look good to me. Thank you for the contribution. I merged sevenn_train_mpi.py into sevenn.py. Is it executable without the 'SEVENN_PATH'? by using the torchrun and the CLI interface sevenn.

thangckt commented 1 month ago

Is it executable without the 'SEVENN_PATH'?

with this merge, just run

mpirun -np $NSLOTS sevenn input.yaml --distributed --distributed_backend='mpi'

quite similar to running with torchrun