ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
415 stars 157 forks source link

Multi-GPU evaluation for optimisation of a full protein structure #309

Closed JSLJ23 closed 5 months ago

JSLJ23 commented 5 months ago

Structural optimisation for full proteins leads to CUDA Out of memory errors:

RuntimeError: CUDA out of memory. Tried to allocate 8.10 GiB. GPU 1 has a total capacty of 39.39 GiB of which 5.30 GiB is free. Including non-PyTorch memory, this process has 34.08 GiB memory in use. Of the allocated memory 25.41 GiB is allocated by PyTorch, and 7.44 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

A simplified version of the code I ran:

from time import perf_counter

from ase import Atom, Atoms
from ase.optimize.sciopt import SciPyFminBFGS
from mace.calculators import mace_off
from rdkit import Chem

pdb_path = "./CDK2_4KD1_protein_only.pdb"

rdkit_mol = Chem.MolFromPDBFile(str(pdb_path), removeHs=False)

def build_ase_molecule_from_rdkit(rdkit_mol, conformer_index):
    conformer = rdkit_mol.GetConformer(conformer_index)
    ase_atoms = []
    rdkit_to_ase_atom_mapping = {}
    for index, atom in enumerate(rdkit_mol.GetAtoms()):
        positions = conformer.GetAtomPosition(atom.GetIdx())
        ase_atom = Atom(symbol=atom.GetSymbol(), position=(positions.x, positions.y, positions.z))
        rdkit_to_ase_atom_mapping[f"{atom.GetSymbol()}_{atom.GetIdx()}"] = f"{ase_atom.symbol}_{index}"
        ase_atoms.append(ase_atom)

    ase_molecule = Atoms(ase_atoms)

    return ase_molecule, rdkit_to_ase_atom_mapping

ase_molecule, atom_mapping = build_ase_molecule_from_rdkit(rdkit_mol=rdkit_mol, conformer_index=0)

mace_calc = mace_off(model="medium", device="cuda:1")

ase_molecule.set_calculator(mace_calc)
print(ase_molecule.get_potential_energy() / 27.211)

start = perf_counter()
geom_opt = SciPyFminBFGS(ase_molecule)
geom_opt.run(fmax=0.05)
end = perf_counter()

print("Time taken:", end - start)

print(ase_molecule.get_potential_energy() / 27.211)

Possible feature request / solution

Would it be possible to have a multi-GPU version of MACE where I could specify something like:

mace_calc = mace_off(model="medium", device="cuda:0,1")

and the MACE model would be distributed over 2 GPUs where the energy of half of the protein's atoms could be computed on one GPU and the other half on another GPU? This might potentially solve the CUDA OOM errors I am currently facing.

Files

CDK2_4KD1_protein_only.zip

wcwitt commented 5 months ago

This is possible using LAMMPS but at present there is a significant performance loss when multiple GPUs are used for inference. We are working actively to resolve this.

97gamjak commented 1 day ago

Was there any progress regarding multi-gpu single point calculations (not training)?