clab / dynet

DyNet: The Dynamic Neural Network Toolkit
Apache License 2.0
3.43k stars 704 forks source link

NaN or Inf detected when computing log_softmax #1634

Open eahogue opened 3 years ago

eahogue commented 3 years ago

Immediately when I start training a model I get "NaN of Inf detected" when this line happens:

logloss = log_softmax(f_i, valid_frames)

Note this is with immediate_compute and check_validity turned on. If they aren't, then the error seems to happen a little later in the process.

In the most recent run, the values being passed to log_softmax are:

f_i = expression 1630/2 valid_frames = [204, 28]

Can someone help me understand why this input is returning either inf or nan? I've looked through the issues and it seems to be something different each time.

Here is an example of what log_softmax returns (not the same run as above though):

logloss = expression 3429/2

Thanks!

ekayen commented 3 years ago

I ran into a similar problem when using immediate_compute and check_validity. It seems like I can fix it by simply removing the restr argument when I call log_softmax(). The documentation for log_softmax() says that "All elements not included in restriction are set to negative infinity." I suspect that when you have immediate_compute and/or check_validity on, it might be catching these -inf values that were put there by the restr argument and then flagging these as problems. If this is actually what's going on, then I think this is a bug.

Like you, even with the check_validity mode off, I am still getting nan errors later on -- I suspect those are coming from another source, which I have yet to pin down.