alexanderkroner / saliency

Contextual Encoder-Decoder Network for Visual Saliency Prediction [Neural Networks 2020]
MIT License
172 stars 46 forks source link

Would you like to ask why the training result is nan #26

Closed yyyyet closed 3 weeks ago

yyyyet commented 1 year ago

No code changes have been made, and the version of the library is compliant This is the result of my training image Only the first picture was correct when the 10 pictures were tested image Thanks to reassure!

alexanderkroner commented 1 year ago

Hi, I tested the code again but wasn't able to reproduce this issue. Are all the package requirements satisfied?

yyyyet commented 1 year ago

It is installed according to the Requirements you provided. I used to have a computer that could run the program successfully, but now I have changed to a new computer and it can't work. Is it because the 3070 can't use cuda10.0, but there is no error and warning in the running process. It is now successful only when testing with a single image for example: python main.py test -d salicon-p salicon.jpg These are all the libraries I use: image image

alexanderkroner commented 1 year ago

Indeed, the package versions are the same as the ones I tested the code with. Could it be that this issue is relevant for you?

yyyyet commented 1 year ago

Well, I don't know. I'll try again. Thank you.

isksjsksk commented 6 months ago

Indeed, the package versions are the same as the ones I tested the code with. Could it be that this issue is relevant for you?

I had the same problem with RTX4090 and this helps a lot!

Vaishnavi-Na commented 3 weeks ago

Hi there! I'm having a similar issue with the the training result being NaN and only the first picture generating a result. I'm currently using a Windows computer, and I believe Nvidia-TensorFlow is for Linux machines. Do you have any suggestions for this? Thank you so much!

achilatiao commented 3 weeks ago

same nan. using cpu with the same speed with gpu.

achilatiao commented 3 weeks ago

same nun problem. Just delete the model you trained first. And It will be solve.

Vaishnavi-Na commented 3 weeks ago

same nun problem. Just delete the model you trained first. And It will be solve.

What do you mean by deleting the model I trained first?

achilatiao commented 3 weeks ago

nan nan nan nan nan nan nan nan nan nan nan nan nan 1.415068 1.325778 1.373514 nan

achilatiao commented 3 weeks ago

seams like still bugs

achilatiao commented 3 weeks ago

I use only 100 pictures in this three:1.415068 1.325778 1.373514 others used 10000 and it return nan

Then I delete all the .result and weight and start the traning again and here is the output:

Epoch 01/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.283717 (0:00:12) Valid loss: 1.149827 (0:00:01) Best model! Epoch 02/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.346064 (0:00:15) Valid loss: 1.299754 (0:00:01) Epoch 03/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.339179 (0:00:05) Valid loss: 2.202006 (0:00:01) Epoch 04/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.382328 (0:00:05) Valid loss: 2.739692 (0:00:01) Epoch 05/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.381477 (0:00:05) Valid loss: 1.158773 (0:00:01) Epoch 06/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.343635 (0:00:05) Valid loss: 1.213211 (0:00:01) Epoch 07/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.419941 (0:00:05) Valid loss: 1.211412 (0:00:01) Epoch 08/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.429422 (0:00:05) Valid loss: 1.240724 (0:00:01) Epoch 09/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.385073 (0:00:05) Valid loss: 1.238499 (0:00:01) Epoch 10/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.423644 (0:00:05) Valid loss: 1.060307 (0:00:01) Best model! Epoch 11/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:18) Valid loss: nan (0:00:01) Epoch 12/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 13/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 14/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 15/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 16/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 17/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 18/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 19/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 20/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:05) Valid loss: nan (0:00:01)