g-simmons / 289G_NLP_project_FQ2020

0 stars 1 forks source link

DAG-LSTM cell state exploding in a single forward pass, clipping may be limiting gradient flow? #17

Closed g-simmons closed 3 years ago

g-simmons commented 3 years ago

Was getting nan/inf issues while trying to get the model to backprop, realized that the cell state for the DAG-LSTM was growing to inf. This makes sense, since cell.forward() is happening many times per sentence. fj is always positive, i is always positive, c_hat may not necessarily be positive since it's coming out of a tanh but I'm imagining the summation term tends to dominate in the fifth equation below.

image

Solved this for now by clipping the cell state https://github.com/g-simmons/289G_NLP_project_FQ2020/blob/fa49e4a4b9450d7861f5ed3dc32b915dca2f328d/py/daglstmcell.py#L84

I have a feeling this may not be ideal solution, since the tensorboard visualizations show some tangible weight updates across steps for the output linear layer, but the earlier layers look pretty much unchanged. Maybe the clipping is limiting gradient flow.

g-simmons commented 3 years ago

not an issue after new daglstm