aiqm / torchani

Accurate Neural Network Potential on PyTorch
https://aiqm.github.io/torchani/
MIT License
446 stars 125 forks source link

Error when using pytorch lightning #607

Open kryczko opened 2 years ago

kryczko commented 2 years ago

I am trying to define an ANI model along with the AEVComputer (with cuda enabled) module within a Pytorch Lightning Module, but I am getting the following error:

RuntimeError: coordinates, species, and aev_params should be on the same device

I have seen that some of the parameters are registered as buffers, but some are not. Please let me know what you think.

Kev

not-matt commented 2 years ago

Similar issue, probably related?

num_repeats = torch.where(pbc, num_repeats, num_repeats.new_zeros(()))
                  ~~~~~~~~~~~ <--- HERE
    r1 = torch.arange(1, num_repeats[0].item() + 1, device=cell.device)
    r2 = torch.arange(1, num_repeats[1].item() + 1, device=cell.device)
RuntimeError: Expected condition, x and y to be on the same device, but condition is on cpu and x and y are on cuda:0 and cuda:0 respectively

Clean conda environment on Ubuntu, installed packages:

openmm                    7.7.0            py39h792354b_0    conda-forge
openmm-torch              0.5             cuda112py39hb628e3f_0    conda-forge
openmmml                  1.0                      pypi_0    pypi
pytorch                   1.10.0          cuda112py39h3ad47f5_1    conda-forge
pytorch-gpu               1.10.0          cuda112py39h0bbbad9_1    conda-forge
torchani                  2.2.3.dev2+g3dfbaf4          pypi_0    pypi
yueyericardo commented 2 years ago

Hi, thanks for the report! Could you provide a minimal example to reproduce this?

not-matt commented 2 years ago

It might be more suitable for a separate issue since I'm using an openmm stack.

See the full output of the code here:

https://github.com/meyresearch/ANI-Peptides/blob/main/demos/ANI_minimal.ipynb

Setup

  1. Install openmm and pytorch
    conda install -c conda-forge openmm openmm-torch pytorch cudatoolkit=11.5
  2. In bashrc set CUDA_HOME to /usr/local/cuda and add /usr/local/cuda to PATH
  3. Install torchani with cuaev:
    git clone https://github.com/aiqm/torchani
    cd torchani
    python setup.py install --cuaev
  4. Install openmm-ml
    git clone https://github.com/openmm/openmm-ml
    pip install openmm-ml/.
  5. Fetch sample peptide
    wget -q https://github.com/meyresearch/ANI-Peptides/raw/main/pdbs/aaa.pdb

Code

# Import libraries
from openmm.app import *
from openmm import *
from openmm.unit import *
from openmmml import MLPotential
import sys

# Setup
pdb = PDBFile("aaa.pdb")
potential = MLPotential('ani2x')
system = potential.createSystem(pdb.topology)
integrator = LangevinIntegrator(
    300 * kelvin, 
    1 / picosecond, 
    1.0 * femtosecond,
)
simulation = Simulation(
    pdb.topology,
    system,
    integrator,
    Platform.getPlatformByName("CUDA"),
)
simulation.context.setPositions(pdb.positions)

# Minimize and run
simulation.minimizeEnergy()
simulation.step(1000)
print("done")
yueyericardo commented 2 years ago

Hi, the error came from the openmm-ml wrapper. A temp fixed version work ONLY for GPU could be found at: https://github.com/yueyericardo/openmm-ml/commit/1d1d3f24f40becdcd8a36431c8d0900d98eb1304#diff-911692ca194bf903c77d038662969ad3277dcf2fa8b3b3048d95a5aa3af59de1

It is using cuaev use_cuda_extension for aev calculation, but it currently does not support pbc, so if you want to use cuaev, you have to change your script slightly to

pdb = PDBFile("aaa.pdb")
# add this line
pdb.topology.setPeriodicBoxVectors(None)
potential = MLPotential('ani2x')

Our internal version has some other updates to make it faster, but it currently is not open source yet. In the meanwhile, openmm team is building NNPOPS for ani and schnet, you could track the progress here Add example of using NNPOps with openmm-torch?!

Edit: BTW, our conda-forge package includes the latest public build with cuaev: you could install it directly by

conda install -c conda-forge torchani
not-matt commented 2 years ago

Fantastic! Thank you for looking into this and getting back to me so quickly.

kryczko commented 2 years ago

I am still getting the same issue I showed above while using an ANI model within pytorch lightning. Any ideas how to fix it?