aiqm / torchani

Accurate Neural Network Potential on PyTorch
https://aiqm.github.io/torchani/
MIT License
446 stars 125 forks source link

Using ANI+CUAEV in a TorchScript-compiled model from Libtorch #645

Closed lmuellender closed 1 month ago

lmuellender commented 1 month ago

Hi! I'm currently working on an interface to use arbitrary neural network potentials as force fields in GROMACS, and using TorchANI, particularly the ANI-2x model, as a test case. My issue is the following: I've written a short wrapper around an ANI2x model and saved it using TorchScript, loosely following this tutorial. However, when I try to load the compiled model in C++/Libtorch via torch::jit::load(), i get the error

Unknown type name '__torch__.torch.classes.cuaev.CuaevComputer':
  File "code/__torch__/torchani/aev.py", line 15
  training : bool
  _is_full_backward_hook : NoneType
  cuaev_computer : __torch__.torch.classes.cuaev.CuaevComputer
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  cuaev_enabled : bool
  Rcr : Final[float] = 5.0999999999999996

If I understand correctly, this error is caused by the fact that the CuaevComputer is added as a new custom class to Pytorch, which the Libtorch installation doesn't know about, and so it throws the error when de-serialising. \ So work around this, for now I'm forcing CUAEV off when saving the model (by setting has_cuaev = False in torchani.aev), which is not ideal because I'm missing out on the performance boost it brings. I would appreciate if you could point out to me where I'm doing something wrong or how to tell my Libtorch installation about the CUAEV class. Also, for the TorchScript-compiled model to be completely portable, adding a custom class to PyTorch should probably be avoided altogether.

I'm using TorchANI v2.2.4, Pytorch 2.1.2 (I had trouble getting things to work with 2.2.2/nightly), Libtorch 2.1.1, CUDA 11.8. (Libtorch 2.1.2 + CUDA 12.1 results in a CUBLAS_STATUS_NOT_INITIALIZED error when running backprop through the model, seems to be a bug in torch.)

lmuellender commented 1 month ago

After having another look at how to extend pytorch with custom operators, I fixed this issue by compiling the cuaev class into a shared library file and linking to it in the cmake project.

yueyericardo commented 1 month ago

Hi Lukas, sorry for the late reply. Thanks for integrating TorchANI into GROMACS. Congrats for fixing the cuaev compilation issue. Here is another example to build cuaev with other libraries (though it might be late): https://github.com/yueyericardo/cuaev_cpp

lmuellender commented 1 month ago

Hi Jinze, thank you for the reference! Its very helpful still.