aiqm / torchani

Accurate Neural Network Potential on PyTorch
https://aiqm.github.io/torchani/
MIT License
459 stars 126 forks source link

The conversion of the ANI-2x model to TorchScript fails with PyTorch 1.9 #598

Closed raimis closed 2 years ago

raimis commented 2 years ago

The conversion of the ANI-2x model to TorchScript fails with PyTorch 1.9:

import torch
import torchani

print(torch.__version__)
print(torchani.__version__)

torch.jit.script(torchani.models.ANI2x())
$ conda install -c conda-forge pytorch=1.9 torchani
$ python debug.py 
1.9.0
2.2
Traceback (most recent call last):
  File "debug_torchani.py", line 7, in <module>
    torch.jit.script(torchani.models.ANI2x())
  File "/shared/raimis/opt/miniconda/envs/tmp_torchani/lib/python3.8/site-packages/torch/jit/_script.py", line 1096, in script
    return torch.jit._recursive.create_script_module(
  File "/shared/raimis/opt/miniconda/envs/tmp_torchani/lib/python3.8/site-packages/torch/jit/_recursive.py", line 412, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/shared/raimis/opt/miniconda/envs/tmp_torchani/lib/python3.8/site-packages/torch/jit/_recursive.py", line 474, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/shared/raimis/opt/miniconda/envs/tmp_torchani/lib/python3.8/site-packages/torch/jit/_script.py", line 497, in _construct
    init_fn(script_module)
  File "/shared/raimis/opt/miniconda/envs/tmp_torchani/lib/python3.8/site-packages/torch/jit/_recursive.py", line 437, in init_fn
    cpp_module.setattr(name, orig_value)
RuntimeError: Could not cast attribute 'rev_species' to type Dict[Tensor, int]: Unable to cast Python instance to C++ type (compile in debug mode for details)

But it works with PyTorch 1.8:

$ conda install -c conda-forge pytorch=1.8 torchani
$ python debug.py
1.8.0
2.2
yueyericardo commented 2 years ago

Hi Raimondas,

Thanks for the report! This issue has been fixed by @IgnacioJPickering on our internal code, but haven't reflected on the public one.

@IgnacioJPickering can we merge it here?

Richard

yueyericardo commented 2 years ago

ping @IgnacioJPickering

IgnacioJPickering commented 2 years ago

ohh yes, thanks for the heads up! I will merge this tomorrow, I hadn't noticed this stopped working in the public repo.

On Thu, 14 Oct 2021, 22:23 Jinze (Richard) Xue, @.***> wrote:

ping @IgnacioJPickering https://github.com/IgnacioJPickering

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aiqm/torchani/issues/598#issuecomment-943932556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFDKQXGCWCJK5BPJN2UZZTUG6GAVANCNFSM5FOSUROQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

yueyericardo commented 2 years ago

Hi, sorry for the confusion. The problem here is actually because the conda-forge torchani is too old (which is not maintained by us), the code in the master branch is already fixed.

yueyericardo commented 2 years ago

Hi, we will maintain the conda-forge torchani from now on, and will let you know once it is done.

yueyericardo commented 2 years ago

Will be ready once https://github.com/conda-forge/torchani-feedstock/pull/2 got merged.

yueyericardo commented 2 years ago

@raimis For conda-forge torchani, we would like to support only latest pytorch and cuda, but to be compatible with openmm, it currently is building with the following matrix. So let me know if we could remove some versions of pytorch, for example 1.8.0 or 1.9.0.

pytorch

and cuda

raimis commented 2 years ago

Thanks for your effort @yueyericardo. If you support at least the two latest versions, I would be good for enough for OpenMM-Torch compatability:

PyTorch: 1.9.1, 1.10.0

CUDA: 11.1, 11.2

yueyericardo commented 2 years ago

Hi Raimondas, torchani 2.2.1 is available at conda-forge now. It currently support all matrix I mentioned in last comment. The version you previously used on conda-forge is pretty old, let me know if this new version works with nnpops.

raimis commented 2 years ago

OK! I'll try.

raimis commented 2 years ago

@yueyericardo I tried TorchANI 2.2.1. The conversion to TorchScript has been fixed, but still there is an issue with serialisation:

import torch
import torchani

print(torch.__version__)
print(torchani.__version__)

torch.jit.script(torchani.models.ANI2x()).save('model.pt')
1.10.0
2.2.1
Traceback (most recent call last):
  File "/home/raimis/NNPOps.git/debug.py", line 7, in <module>
    torch.jit.script(torchani.models.ANI2x()).save('model.pt')
  File "/home/raimis/opt/miniconda/envs/nnpops/lib/python3.9/site-packages/torch/jit/_script.py", line 686, in save
    return self._c.save(str(f), **kwargs)
RuntimeError: Cannot serialize custom bound C++ class __torch__.torch.classes.cuaev.CuaevComputer. Please define serialization methods via def_pickle() for this class.
raimis commented 2 years ago

@yueyericardo thanks for your effort! The issue is solved and the solution integrated into NNPOps (https://github.com/openmm/NNPOps/releases/tag/v0.1)