cutoff - Githubissues

neojie commented 4 years ago

There are many how different ways we can set cutoffs in schnetpack. Say, when we build data, we can invoke AtomsData('water_en.db', available_properties=['energy','forces'], environment_provider = AseEnvironmentProvider(5))) When we load data, we can set cutoff by data = MD17('../mydata.db',environment_provider= AseEnvironmentProvider(5)) Also, we can set cutoff in

schnet = spk.representation.SchNet(
    n_atom_basis=n_features,
    n_filters=n_features,
    n_gaussians=25,
    n_interactions=3,
    cutoff=5.,
    cutoff_network=spk.nn.cutoff.CosineCutoff
)

Just wonder how these three different ways (especially the last two) of setting cutoff would affect the training. I am asking because I thought setting cutoff would accelerate the training. However, I found that the training is slowed down by ~20% if I use data = MD17('../mydata.db',environment_provider= AseEnvironmentProvider(5)), although I also found that the corresponding convergence is much faster.

mgastegger commented 4 years ago

The first two ways are basically identical, since the MD17 dataset makes use of AtomsData. So there are two instances where a cutoff can be given:

via an environment provider when generating a dataset. Here the environment provider is used to collect the neighbors of each atom. If a cutoff is given, all neighbors beyond the cutoff are neglected. This means, the network never sees these neighbors and no interactions are computed. This can save quite some computations for large/periodic systems.
via the cutoff passed to the model. In this case, after all pairwise interactions indicated in the neighbor list generated via the environment provider are computed, a cutoff function is applied to these interactions, forcing them to decay to zero at the cutoff.

In general, one wants to introduce a cutoff to save computational time. The idea is, that interactions with atoms past a certain distance from the central atom become quite small and can be neglected. This can also lead to better generalization and by modeling local atomic environments the potentials become transferable. In our setup, the selection would be done via 1. However, if neighbors are purely ignored past a certain point, there will be discontinuities in the potential energy surface. Because of this, the second type of cutoff is introduced, which smoothly decays the interactions to 0.

To make things more complicated, there are two environment providers in SchNetPack, the SimpleEnvironmentProvider (default) and the AseEnvironmentProvider. The former just generates environments containing all neighbors without regards to cutoff, the second uses special algorithms to generate the neighbor lists. How well they perform depends on the size of the molecule. For small molecules, the simple provider works faster, as it uses little overhead to construct the environment. For large molecules, the cost incurred by using the more elaborate ASE provider is offset by not having to evaluate all interactions, making this the better choice.

In your case, it could be that the systems are to small to profit from using the ASE provider with cutoff. However, I am not sure what the reasons for the improvement in convergence are. The only thing I could imagine is, that no cutoff was set when building the model. Then the ASE provider would also serve as cutoff for the interactions, making the model more local and improving convergence. Unfortunately, using only the provider cutoff, the model would suffer from discontinuities.

In general, for small molecules use the simple provider and the ASE for larger ones (never tested this, but I would guess starting around 40 to 50 atoms) and periodic boundary conditions. A cutoff for the interactions should be used in all cases. And never set the provider cutoff smaller than the model cutoff.

neojie commented 4 years ago

Thank you for your prompt reply. 1) setting cutoff via an environment provider when generating a dataset. Is there any way of saving the calculated information in the database? For example,

from ase.build import molecule
from schnetpack import AtomsData
from schnetpack.environment import AseEnvironmentProvider
import numpy as np
from ase.calculators.emt import EMT
water = molecule('H2O')
water.set_calculator(EMT())
atoms = [water]
property_list = []
for at in atoms:

    energy = np.array([at.get_potential_energy()], dtype=np.float32)
    forces = np.array([at.get_forces()], dtype=np.float32)
    property_list.append(
        {'energy': energy,'forces':forces}
    )

db =AtomsData('water.db', available_properties=['energy','forces'],environment_provider=AseEnvironmentProvider(1))
db.add_systems(atoms,property_list)

db[0]['_neighbors']

We would get a neighbor list as

tensor([[ 1,  2],
        [ 0, -1],
        [ 0, -1]])

However, we when load the data by

from schnetpack.datasets import MD17
db_loaded = MD17('water.db')
db_loaded[0]['_neighbors']

we get the neighbors list as

tensor([[1, 2],
        [0, 2],
        [0, 1]])

I understand that this is because we the default environment is SimpleEnvironmentProvider. In order to recover the data we have to set

db_loaded = MD17('water.db',environment_provider=AseEnvironmentProvider(1))

But in this case, we actually re-calculate the neighbors using the AseEnvironmentProvider. Just wonder if there is anyway by which we can store the neighbors list info in the dataset and when we load it, we can directly get all the stored info rather than calculate it again.

2) My system is actually pretty big containing 160 atoms. If I have set cutoff when loading the data by db_loaded = MD17('water.db',environment_provider=AseEnvironmentProvider(1)) , according to your answer, we do not have to set cutoff when building the representation, because the network never sees these neighbors beyond the cutoff anyway. Right?

Thank you. Jie

mgastegger commented 4 years ago

Currently, there is no way to store the computed neighbor lists in SchNetPack. We found that this quickly uses a lot of disk space and due to the IO overhead it is usually actually faster to compute the neighbors on the fly. But you could always extend your database to store the neighbor lists as additional properties.
I actually recommend to set the representation cutoff all the time, since otherwise it would lead to discontinuities in the potential energy surface. It is true, that the central atom never sees the atoms beyond the cutoff, but if an atom e.g. leaves the cutoff during simulation, it is suddenly gone. This leads to jumps in the predicted properties. The representation cutoff makes it, so that the atoms fade in/out in a continuous manner. SchNetPack should use a CosineCutoff by default.

PS.: I do not know if this is a practical example and Angstrom, but I would recommend a cutoff larger than 1 for water.

neojie commented 4 years ago

Thank you. Cutoff of 1 A is just a practical example.

simonbatzner commented 4 years ago

So why would I ever set the environment cutoff larger than the representation cutoff? If its only purpose is to compute the neighborlists that a convolution runs over, but then the representation cutoff zeros out any features beyond it, then how would the atoms beyond that contribute?

Another, related question to that would be: even if both the data cutoff and the representation cutoff are say 5A, as I have multiple interaction blocks, this would multiply right? So say both my cutoffs are 5A, and I have 3 interaction blocks, then information in the graph would travel over 15A in total because the atom features get updated?

mgastegger commented 4 years ago

To your first question: For training purposes it makes no sense, but it can be useful in molecular dynamics simulations. If the cutoff in the environment is e.g. set 1 A larger than in the representation, then the neighborlist only has to be recomputed if the atoms move out of this buffer zone. This can lead to some speedups, especially for materials.

To the second question: The n_interactions * cutoff formula is in principle right, however it only gives you the value where the interactions definitively decay to 0. The overall behavior is actually a convolution of the cutoff function and looks somewhat like this (for 5A cutoff and cosine): Although e.g. 3 interactions would give contributions up to 15A, they decay around 10A and become very small afterwards. As you can see, this happens quicker the more interactions there are.

simonbatzner commented 4 years ago

Thank you for your answer - so in running MD simulations, this would be like the ghost radius in LAMMPS, that makes sense.

Regarding the cutoff sum - do I understand correctly that this stems from the form of the cosine cutoff, which begins at 0 and ends at the cutoff distance:

cutoffs = 0.5 * (torch.cos(distances * np.pi / self.cutoff) + 1.0)

If so, then it seems like that could easily be fixed then by just using a shifted cosine that would give a value of 1 until a certain radius r_c and only start the cosine form for values r > r_c beyond that.

mgastegger commented 4 years ago

Yes, this is due to the cutoff form. We are not sure, if this is a behavior that should be fixed, at least for molecules it makes sense to focus on the local environment. What this tells you is, that give a 5 Angstrom cutoff, each atom effectively experiences an environment of ~12-14A.

atomistic-machine-learning / schnetpack

cutoff #203