aiqm / torchani

Accurate Neural Network Potential on PyTorch
https://aiqm.github.io/torchani/
MIT License
459 stars 126 forks source link

running TorchANI in parallel #585

Closed qzhu2017 closed 3 years ago

qzhu2017 commented 3 years ago

Hi, not sure if this is the right place to ask. I am trying to use the trained ani model to optimize many molecules or crystals in the parallel mode based on python's multiprocess as follows

from ase.lattice.cubic import Diamond
from ase.optimize import BFGS
from ase.calculators.emt import EMT

import torchani

import multiprocessing as mp
from functools import partial

import warnings
warnings.simplefilter("ignore")

def opt(struc, calculator, steps):
    struc.set_calculator(calculator)
    opt = BFGS(struc, logfile='ase.log')
    opt.run(fmax=0.001, steps=steps)
    print(struc.get_potential_energy())

strucs = []
strucs.append(Diamond(symbol="C", pbc=True))
strucs.append(Diamond(symbol="C", pbc=True))

# Both EMT and AN1ccx work, but ANI2x fails
#calc = EMT()
calc = torchani.models.ANI1ccx().ase()
#calc = torchani.models.ANI2x().ase()

with mp.Pool(2) as p:
    func = partial(opt, calculator=calc, steps=10)
    p.map(func, strucs)

This above code works well if I use ANIccx as the calculator. But it failed when I used AN12x. It complained that too many files were opened.

KeyboardInterrupt
Traceback (most recent call last):
  File "/scratch/qzhu/miniconda3/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/scratch/qzhu/miniconda3/lib/python3.8/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/scratch/qzhu/miniconda3/lib/python3.8/multiprocessing/util.py", line 133, in _remove_temp_dir
    rmtree(tempdir)
  File "/scratch/qzhu/miniconda3/lib/python3.8/shutil.py", line 715, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/scratch/qzhu/miniconda3/lib/python3.8/shutil.py", line 628, in _rmtree_safe_fd
    onerror(os.scandir, path, sys.exc_info())
  File "/scratch/qzhu/miniconda3/lib/python3.8/shutil.py", line 624, in _rmtree_safe_fd
    with os.scandir(topfd) as scandir_it:
OSError: [Errno 24] Too many open files: '/tmp/pymp-pwvnbvp6'

Is it possible that some files were not closed when calling the ANI2x models?

isayev commented 3 years ago

Qiang, nice to hear from you! We never tried to run it in parallel with CPUs. I will check. You will have better success with GPUs. The pytorch code is not very well optimized for the CPU runs. Your best bet would be just running a few ANI scripts in parallel.

On a side note, don't expect a good performance of these public models on crystals:-)

qzhu2017 commented 3 years ago

@isayev Thanks for you quick reply. I need to process many many small molecules/crystals. GPU is probably not useful in this case (if I understand it correctly). The optimized geometries will be passed to my other calculators. So splitting them to a few separate ANI runs is not really convenient. The model won't be perfect, I am curious how it is compared to the generic force fields.

Anyway, I appreciate if you can take a look.

yueyericardo commented 3 years ago

Hi, could you try whether this work?

from ase.lattice.cubic import Diamond
from ase.optimize import BFGS
from ase.calculators.emt import EMT
from ase.build import molecule

import torchani

import multiprocessing as mp
from functools import partial

import warnings
warnings.simplefilter("ignore")

def opt(struc, calculator, steps):
    struc.set_calculator(calculator)
    opt = BFGS(struc, logfile='ase.log')
    opt.run(fmax=0.001, steps=steps)
    print(struc.get_potential_energy())

strucs = []
for i in range(10):
    strucs.append(Diamond(symbol="C", pbc=True))
    strucs.append(molecule('CH4'))

# Both EMT and AN1ccx work, but ANI2x fails
#calc = EMT()
#calc = torchani.models.ANI1ccx().ase()
calc = torchani.models.ANI2x().ase()

processes = []
for struc in strucs:
    p = mp.Process(target=opt, args=(struc, calc, 10))
    p.start()
    processes.append(p)
for p in processes:
    p.join()
qzhu2017 commented 3 years ago

Thank you. Your script works well!