CompPhysVienna / n2p2

n2p2 - A Neural Network Potential Package
https://compphysvienna.github.io/n2p2/
GNU General Public License v3.0
217 stars 82 forks source link

Final "global_activation_short" seems to have no effect #137

Open patrick-austin opened 2 years ago

patrick-austin commented 2 years ago

Describe the bug When training two networks that differ only by the final "global_activation_short" setting, the weights, errors etc. are identical. When changing the activation functions for the preceding layers, the networks train and perform differently (as I'd expect).

To Reproduce Steps to reproduce the behavior:

  1. Which git commit version is used? v2.1.4-1-gb45aafc, also observed on v2.0.3

  2. What are the compilation options (compiler, flags)? Symmetry function groups : enabled Symmetry function cache : enabled Timing function available : available Asymmetric polynomial SFs : available SF low neighbor number check : enabled SF derivative memory layout : reduced MPI explicitly disabled : no

  3. Which application is affected? nnp-train

  4. What are the actual settings (please provide a minimal example if possible)?

    global_hidden_layers_short         2
    global_nodes_short                 20 20
    global_activation_short            s  s  l  

    and

    global_hidden_layers_short         2
    global_nodes_short                 20 20
    global_activation_short            s  s  t

    All other settings are the same, and most are unchanged.

  5. Please provide the error message or describe the crash behavior. No crash or error, or any warning to indicate that the activation functions had been configured incorrectly.

Expected behavior Changing the final activation function would affect the network, like changing the preceding activation functions does.

Additional context It may not be a bug and I'm just configuring the activation functions / number of hidden layers incorrectly, but going off the documentation and examples I can't see what (if anything) I'm doing wrong.

singraber commented 2 years ago

Hello!

Thank you for reporting this, there should indeed be a training difference if different activation functions are selected. I will try to reproduce this and report back...

Best, Andreas

singraber commented 2 years ago

You were right.. the last argument of global_activation_short was silently ignored in the code and the corresponding activation function always set to identity. I have changed this in the current master (see commit here).

However, I generally DON'T recommend to set anything else than the linear activation function because all the other options will limit the range of the output energies! Hence, the NNP cannot be trained to reproduce very large positive or negative potential energies. I also don't think that anything can be gained from setting it to a nonlinear activation because the previous layers can provide enough "non-linearity". Probably this was my original motivation for hard-coding a linear activation in the final output neuron. But you are right that a valid user setting should not be ignored silently. I may add a little warning message in future versions...

Thanks again for spotting and reporting this issue!

All the best, Andreas

patrick-austin commented 2 years ago

Thanks for looking into this. My original motivation for changing the final activation function was in fact to try and limit the range of predictions (as sometimes the network output forces orders of magnitude larger than anything that was in the reference data) and I thought using tanh as the final activation function and might help prevent this.