CompPhysVienna / n2p2

n2p2 - A Neural Network Potential Package
https://compphysvienna.github.io/n2p2/
GNU General Public License v3.0
223 stars 82 forks source link

On the convergence of nnp training #166

Open okunoyukihiro2 opened 2 years ago

okunoyukihiro2 commented 2 years ago

Dear n2p2 developers and users.

I'm now making the neural network potential for amorphous oxide by n2p2.

The data number is 8986 which contain about 100 atoms with 3 element species. The calculation was proceeded but its convergence is very slow and oscillated, We continued until the epoch number reached 100, but did not reach convergence (an energy difference less than 10-4 ).

In such case, how should we handle the convergence problem? Can I deal with this by taking the parameters of the Kalman filter? Or is this phenomenon often due to inadequate symmetry function basis taking?

If I can get some suggestions, I'm very happy.

Sincerely.

My input setting is as below, ###############################################################################

DATA SET NORMALIZATION

###############################################################################

This section was automatically added by nnp-norm.

mean_energy -6.7390861800828272E+00 conv_energy 8.8624677147759523E-01 conv_length 1.9278673431912423E+00 ############################################################################### ###############################################################################

GENERAL NNP SETTINGS

###############################################################################

These keywords are (almost) always required.

number_of_elements 3 # Number of elements. elements A B C # Specification of elements. atom_energy A -0.00368935 # Free atom reference energy (Li). atom_energy B -0.01081012 # Free atom reference energy (O). atom_energy C -0.02143689 # Free atom reference energy (O). cutoff_type 2 # Cutoff type (optional argument: shift parameter alpha). scale_symmetry_functions # Scale all symmetry functions with min/max values.

scale_symmetry_functions_sigma # Scale all symmetry functions with sigma.

scale_min_short 0.0 # Minimum value for scaling. scale_max_short 1.0 # Maximum value for scaling. center_symmetry_functions # Center all symmetry functions, i.e. subtract mean value. global_hidden_layers_short 2 # Number of hidden layers. global_nodes_short 25 25 # Number of nodes in each hidden layer. global_activation_short t t l # Activation function for each hidden layer and output layer.

normalize_nodes # Normalize input of nodes.

###############################################################################

ADDITIONAL SETTINGS FOR DATASET TOOLS

###############################################################################

These keywords are used only by some tools handling data sets:

nnp-comp2, nnp-scaling, nnp-dataset, nnp-train.

use_short_forces # Use forces. random_seed 1234567 # Random number generator seed.

###############################################################################

ADDITIONAL SETTINGS FOR TRAINING

###############################################################################

These keywords are solely used for training with nnp-train.

epochs 100 # Number of training epochs. updater_type 1 # Weight update method (0 = Gradient Descent, 1 = Kalman fi\ lter). parallel_mode 0 # Training parallelization used (0 = Parallel (rank 0 updat\ e), 1 = Parallel (all update)). jacobian_mode 1 # Jacobian computation mode (0 = Summation to single gradie\ nt, 1 = Per-task summed gradient, 2 = Full Jacobian). update_strategy 0 # Update strategy (0 = Combined, 1 = Per-element). selection_mode 2 # Update candidate selection mode (0 = Random, 1 = Sort, 2 \ = Threshold). task_batch_size_energy 1 # Number of energy update candidates prepared per task for \ each update (0 = Entire training set). task_batch_size_force 1 # Number of force update candidates prepared per task for e\ ach update (0 = Entire training set). memorize_symfunc_results # Keep symmetry function results in memory. test_fraction 0.1 # Fraction of structures kept for testing. force_weight 10.0 # Weight of force updates relative to energy updates. short_energy_fraction 1.000 # Fraction of energy updates per epoch. short_force_fraction 0.0041 # Fraction of force updates per epoch. short_energy_error_threshold 0.00 # RMSE threshold for energy update candidates. short_force_error_threshold 1.00 # RMSE threshold for force update candidates. rmse_threshold_trials 3 # Maximum number of RMSE threshold trials.

use_old_weights_short # Restart fitting with old weight parameters.

weights_min -1.0 # Minimum value for initial random weights. weights_max 1.0 # Maximum value for initial random weights.

precondition_weights # Precondition weights with initial energies.

nguyen_widrow_weights_short # Initialize neural network weights according to Nguyen-Wi\

drow scheme. main_error_metric RMSEpa # Main error metric for screen output (RMSEpa/RMSE/MAEpa/MA\ E). write_trainpoints 1 # Write energy comparison every this many epochs. write_trainforces 1 # Write force comparison every this many epochs. write_weights_epoch 1 # Write weights every this many epochs. write_neuronstats 1 # Write neuron statistics every this many epochs. write_trainlog # Write training log file. ####################

GRADIENT DESCENT

####################

This section is only used if "updater_type" is "0".

gradient_type 1 # Gradient descent type (0 = Fixed step size, 1 = Adam). gradient_eta 1.0E-5 # Fixed step size gradient descent parameter eta. gradient_adam_eta 1.0E-3 # Adam parameter eta. gradient_adam_beta1 0.9 # Adam parameter beta1. gradient_adam_beta2 0.999 # Adam parameter beta2. gradient_adam_epsilon 1.0E-8 # Adam parameter epsilon. ############################

KALMAN FILTER (STANDARD)

############################

This section is only used if "updater_type" is "1".

kalman_type 0 # Kalman filter type (0 = Standard, 1 = Fading memory). kalman_epsilon 1.0E-2 # General Kalman filter parameter epsilon (sigmoidal: 0.01,\ linear: 0.001). kalman_q0 0.01 # General Kalman filter parameter q0 ("large"). kalman_qtau 2.302 # General Kalman filter parameter qtau (2.302 => 1 order of\ magnitude per epoch). kalman_qmin 1.0E-6 # General Kalman filter parameter qmin (typ. 1.0E-6). kalman_eta 0.01 # Standard Kalman filter parameter eta (0.001-1.0). kalman_etatau 2.302 # Standard Kalman filter parameter etatau (2.302 => 1 order\ of magnitude per epoch). kalman_etamax 1.0 # Standard Kalman filter parameter etamax (1.0+). #################################

KALMAN FILTER (FADING MEMORY)

#################################

This section is only used if "updater_type" is "1".

The settings here enable an alternative Kalman filter variant and are NOT RECOMMENDED!

kalman_type 1 # Kalman filter type (0 = Standard, 1 = Fading memory).

kalman_epsilon 1.0E-1 # General Kalman filter parameter epsilon (sigmoidal: 0.01\

, linear: 0.001).

kalman_q0 0.00 # General Kalman filter parameter q0 ("large").

kalman_qtau 2.302 # General Kalman filter parameter qtau (2.302 => 1 order o\

f magnitude per epoch).

kalman_qmin 0.0E-6 # General Kalman filter parameter qmin (typ. 1.0E-6).

kalman_lambda_short 0.98000 # Fading memory Kalman filter parameter lambda (forgetting \ factor 0.95-0.99). kalman_nue_short 0.9987 # Fading memory Kalman filter parameter nu (0.99-0.9995).