#6.3.6 and 6.3.7 weird loss values during the training

fchollet / deep-learning-with-python-notebooks

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

MIT License

18.17k stars 8.53k forks source link

#6.3.6 and 6.3.7 weird loss values during the training #127

Open Juusotak opened 4 years ago

Juusotak commented 4 years ago

I executed the codes in sections #6.3.7 and #6.3.6 in my PC. The loss values during the training and validation seem to differ vastly from the values presented here in GitHub or in the book. As I fit the recurrent network using dropout in the section 6.3.6, the loss value of the first epoch was for example 80528032321365409792.0000. In the section 6.3.7 keras only displayed "nan" as training and validation loss value. Is somebody facing the same issues? Any suggestions what is the cause of this issue?

the-drunk-coder commented 4 years ago

Hmm I'm also having weird results, on the example 6.3.4, "A basic machine learning approach", running the notebook on google colab.

Training takes significantly longer, and the loss values are constantly rising.

the-drunk-coder commented 4 years ago

Now I switched the Colab runtime type from "None" to "GPU", now the results look like this: Better, but still very different from the book.

@Juusotak are you using a GPU ?

Juusotak commented 4 years ago

@the-drunk-coder yes I'm using the GPU version of Keras. The loss and val_loss values are as follows thru out the whole fitting process.

Annotation 2020-01-29 213047 PNG

I'm running the code on Spyder 3.6 and I assume the GPU computation is working properly since I can find my GPU at the following output.

Annotation 2020-01-29 214700

nickeduncan commented 4 years ago

Also having the same issue as @Juusotak, my graph for listing 6.38 looks the same as @the-drunk-coder, and then my figure for listing 6.39 look like this:

Listing 6.40 has the same issue as @Juusotak, with extremely large loss values:

Listing 6.41 has loss and val_loss values returning NaN:

hanyun2019 commented 4 years ago

Hi @nickeduncan,

I have found the same problem with you: "Listing 6.41 has loss and val_loss values returning NaN"

Have you found the solution for this problem? Thanks.

Best, Haowen

chriswininger commented 3 years ago

yeah sadly, been seeing the same with this example, sometimes really high loss values, other times NAN

Screenshot from 2021-07-23 06-57-15

All the other examples have worked flawlessly, not sure what's up with this one

fangxuehouwuming commented 2 years ago

I executed the codes in sections #6.3.7 and #6.3.6 in my PC. The loss values during the training and validation seem to differ vastly from the values presented here in GitHub or in the book. As I fit the recurrent network using dropout in the section 6.3.6, the loss value of the first epoch was for example 80528032321365409792.0000. In the section 6.3.7 keras only displayed "nan" as training and validation loss value. Is somebody facing the same issues? Any suggestions what is the cause of this issue?

Hi, I met this problem too. Have you solved it ?