choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

TensorNet RSF #158

Closed MarshallYan closed 1 week ago

MarshallYan commented 2 weeks ago

Description

Implementation of radial symmetry functions for TensorNet.

Todos

Questions

Status

MarshallYan commented 2 weeks ago

On my local Fedora machine, if I don't install the current modelforge with pip install ./ in directory "/home/shuaiy/workdir/modelforge", from modelforge.potential.utils import CosineCutoff wouldn't work with error being: >>> from modelforge.potential.utils import CosineCutoff Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/shuaiy/workdir/modelforge/modelforge/__init__.py", line 7, in <module> from ._version import __version__ ModuleNotFoundError: No module named 'modelforge._version'

However, after pip install ./, the error disappears.

Does this have anything to do with the error occurs in testing?

MarshallYan commented 2 weeks ago

What is the error with environment installation on macOS?

ianmkenney commented 2 weeks ago

@MarshallYan It looks to me that the ray-core build osx-arm64/ray-core-2.12.0-py311h15aa61a_0 is marked as broken in the package files, which is the build currently used in the last CI pass in main. Looking at the latest CI pass, it appears that the environment was cached. I assume if the environment cache is removed in the main CI, you'd see the same error.

It might be related, but to get the tests passing locally on osx-arm64, I needed to use the PyPI builds for all ray components. That's a separate issue that probably needs more consideration from @wiederm.

Update: Trying to install that build locally in fresh environment gives a similar error

ikenney@gleipnir tmp % mamba install ray-core=2.12.0=py311h15aa61a_0

Looking for: ['ray-core==2.12.0=py311h15aa61a_0']

conda-forge/osx-arm64                                       Using cache
conda-forge/noarch                                          Using cache
Could not solve for environment specs
The following package could not be installed
└─ ray-core ==2.12.0 py311h15aa61a_0 does not exist (perhaps a typo or a missing channel).
wiederm commented 2 weeks ago

thanks @ianmkenney for figuring out this error! I am on it!

wiederm commented 2 weeks ago

@MarshallYan , the issue you mentioned here is normal --- you need to install the package so that its namespace is exposed in your current working environment.

wiederm commented 2 weeks ago

@MarshallYan , you mentioned the lower limit of the cosine cutoff function. Can you please expand on this? How does this impact the cutoff behavior? Does the lower limit define the beginning of the influence of the Cutoff function?

wiederm commented 2 weeks ago

Match units (angstrom in TensorNet; nanometer in modelforge)

We decided to keep nanometer everywhere internally, and all public APIs accept values in units. That means you can set, e.g., the cutoff with 5 * unit.angstrom, but internally, we will remove the unit and convert it to nanometer.

MarshallYan commented 2 weeks ago

@MarshallYan , you mentioned the lower limit of the cosine cutoff function. Can you please expand on this? How does this impact the cutoff behavior? Does the lower limit define the beginning of the influence of the Cutoff function?

In TensorNet, if the lower limit of cutoff function is set to anything larger than 0:

cutoffs = 0.5 * (torch.cos(math.pi * (2 * (distances - self.cutoff_lower) / (self.cutoff_upper - self.cutoff_lower) + 1.0)) + 1.0)

Then,

    # remove contributions below the cutoff radius
    cutoffs = cutoffs * (distances < self.cutoff_upper)
    cutoffs = cutoffs * (distances > self.cutoff_lower)
MarshallYan commented 1 week ago

@wiederm I made the radial symmetry function work by carrying angstrom as the unit of everything just as what TensorNet does. I can techniqually hide that by doing the unit transformation inside class functions but does that make sense? What is our way of using openff.units? Right now it seems to me that this is an extra step because I still need to *10 or /10.