Was getting nan/inf issues while trying to get the model to backprop, realized that the cell state for the DAG-LSTM was growing to inf. This makes sense, since cell.forward() is happening many times per sentence. fj is always positive, i is always positive, c_hat may not necessarily be positive since it's coming out of a tanh but I'm imagining the summation term tends to dominate in the fifth equation below.
I have a feeling this may not be ideal solution, since the tensorboard visualizations show some tangible weight updates across steps for the output linear layer, but the earlier layers look pretty much unchanged. Maybe the clipping is limiting gradient flow.
Was getting nan/inf issues while trying to get the model to backprop, realized that the cell state for the DAG-LSTM was growing to inf. This makes sense, since cell.forward() is happening many times per sentence. fj is always positive, i is always positive, c_hat may not necessarily be positive since it's coming out of a tanh but I'm imagining the summation term tends to dominate in the fifth equation below.
Solved this for now by clipping the cell state https://github.com/g-simmons/289G_NLP_project_FQ2020/blob/fa49e4a4b9450d7861f5ed3dc32b915dca2f328d/py/daglstmcell.py#L84
I have a feeling this may not be ideal solution, since the tensorboard visualizations show some tangible weight updates across steps for the output linear layer, but the earlier layers look pretty much unchanged. Maybe the clipping is limiting gradient flow.