choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

Make ANI NNP flexible #98

Open chrisiacovella opened 2 months ago

chrisiacovella commented 2 months ago

Right now, if we try to train a dataset with the ANI-2x NNP, we will be limited to the 7 atomic species in the original implementation (H, C, O, N, F, S, Cl).

Copy and pasting from issue #96:

@wiederm wrote:

The ANI architecture is unique. Each element-specific neural network only considers atoms with its specific element --- therefore adding a new element requires the careful construction of a balanced dataset. There is no 'transfer' learning like with other architectures (e.g., SchNet), in which each neural network layer is trained on each atom, and the element information is passed with the environment representation through the network.

ANI-2x has been developed for a specific set of elements (H, C, O, N, S, F, and Cl), and we shouldn't add additional elements using the ANI-2x naming (the only difference between ANI-1x and ANI-2x is the number of elements). Since each new element adds a new multilayer perception, the number of parameters that need to be trained increases significantly.

Having said all of this, I think it is worth adding a Phosphorus atomic network, and we might want to think about an element embedding instead of element-specific neural networks. But we can't call this ANI-2x.

Changing the number of elements will of course change things, but since our goal is to not necessarily just retrain ANI-2x but use the same approach with different datasets, we probably need to have this dynamically identify which elements and generate the atomic_network element for each element found. This of course could be problematic with too many atomic species (in terms of speed of training), but again if our goal is to compare different potential approaches with otherwise matching data, this might be necessary.

Anyway, this is something we will need to discuss in more detail later.