Embedding spin multiplicity and charge based on SpookyNet implementation

atomistic-machine-learning / schnetpack

SchNetPack - Deep Neural Networks for Atomistic Systems

Other

792 stars 215 forks source link

Embedding spin multiplicity and charge based on SpookyNet implementation #608

Closed epens94 closed 8 months ago

epens94 commented 9 months ago

Type of PR

[ x ] Feature

Description Right now only singlet neutral charged molecules can be clearly distinguished by the model representations (SchNet, PainNN etc) Main addition is the new class "ElectronicEmbedding". The implementation follows reference https://doi.org/10.1038/s41467-021-27504-0 (SpookyNet) The new class can be invoked for SchNet and PaiNN representations (SO3 is not yet implemented)

Validation of new Feature The feature was tested on two model systems Ag3+/Ag3- and singlet/triplet CH2. Training was done on energies and forces. The databases were the same as used in the reference. PES calculation were done on GFN2-xtb level of theory. With spin_charge embedding deviations from r (bond length between Ag-Ag or C-H) and $\alpha$ (angle between 2,0,1) are below 0.1 (see figure 1 and 2). The RMSD for the minimum energy structures are below 10e-4 Angstrom.

figure_2 figure_1

When spin respective charge are embedded the qualitative correct PES can be obtained (see figure PES-carben PES-ag3

Additional Notes In SpookyNet reference a complex nuclear embedding method based on the atomic electronic configuration was used. Invoking this nuclear embedding method results in slower model convergence (approx. ten times slower). The final error metric (MAE) is roughly 50% better than the simple nuclear embedding scheme. For both systems the MAEs on the test sets were 10-25x lower (Ag3+/Ag3-), 50-120x lower (singlet/triplet CH2) as plain chemical accuracy (1kcal/mol).

stefaanhessmann commented 8 months ago

Thanks for the PR, @epens94! I checked the code and have some general suggestions for refactoring the code to better fit the current style:

merge the current embedding files (nuclear, charge, spin) to a single file nn/embedding.py.
move electron configurations.py to nn/embeddings.py
move ShiftedSoftplus to nn/activations.py and replace the function shifted_softplus(). Please also check for any references in spk to this function. Add an argument trainable=False, such that it has the same behavior as as the old function by default and still allows to be trainable.
move the remaining classes from nn/residual_blocks.py to nn/blocks.py
instead of passing str arguments for activation, use Callable arguments like e.g. in the PaiNN class. I will try to mark it in the PR

Since I have no access to your forked repo, I can not do these changes myself :)

epens94 commented 8 months ago

I think I implemented all of your suggestions. Since there have been a lot of minor changes, I would prefer to rerun the experiments on Ag3+ clusters and carbens. Just to be on the sure side, that the changes did not break anything unintentionally. I will post the results asap. It would be great if you could especially check the correct implementation of the activation function part (ShiftedSoftplus)

epens94 commented 8 months ago

ShiftedSoftplus with trainable True results in NaN values. I havent figured out why yet. Rerunning the experiments on Ag3 clusters and carbens with trainable set to False for gives me the same results as shown in the above figures

stefaanhessmann commented 8 months ago

Hi @epens94 , thanks for fixing all the comments! The PR looks fine to me now. Just 2 more questions:

Did you test, if the issue with the NaN-values has been solved?
I just checked the So3Net class and it seems like the new representations are not implemented there. If there is no specific reason to exclude So3Net, I suggest we might add the option for the new embedding layers there.

Apart from that, I think we are ready to merge :)

epens94 commented 8 months ago

@Stefaanhess Yes the issue with the NaN values has been solved. I will add the electronic embedding for the so3net today (which should be quick) Afterwards, I will test everything (electronic embedding layer, nuclear embedding layer, trainable activation function) again, just to make sure that everything works as expected.