barkincavdaroglu / Link-Prediction-Mesh-Network

PyTorch Implementation of a Deep Learning Model for Temporal Link Prediction in MANETs
2 stars 1 forks source link

Fix vanishing gradient problem in attention mechanism and recurrent layer #6

Closed barkincavdaroglu closed 1 year ago

barkincavdaroglu commented 1 year ago

Gradient becomes 0 in backpropagation for attention mechanism.

Screenshot 2022-12-29 at 19 15 22

Consider:

Looks like the problem originates from the LSTM/GRU layer. Gradients of this layer becomes zero very fast, and since the attention layer is behind the recurrent layer, gradients of attention layer are not properly updated.

training_grad_2 0_norm_model lstm weight_hh_l0_epoch

Applying PowerTransformer to dataset and decreasing learning_rate of RMSprop seems to have helped the gradients a little.

Screenshot 2022-12-31 at 14 28 32

Increasing num_epochs to 50 and setting gradient clip value to 5 has solved the issue.

Screenshot 2022-12-31 at 14 48 28

Decreasing clip value could be further beneficial [will experiment during hyperparameter tuning].