Closed yyyyet closed 3 weeks ago
Hi, I tested the code again but wasn't able to reproduce this issue. Are all the package requirements satisfied?
It is installed according to the Requirements you provided. I used to have a computer that could run the program successfully, but now I have changed to a new computer and it can't work. Is it because the 3070 can't use cuda10.0, but there is no error and warning in the running process. It is now successful only when testing with a single image for example: python main.py test -d salicon-p salicon.jpg These are all the libraries I use:
Indeed, the package versions are the same as the ones I tested the code with. Could it be that this issue is relevant for you?
Well, I don't know. I'll try again. Thank you.
Indeed, the package versions are the same as the ones I tested the code with. Could it be that this issue is relevant for you?
I had the same problem with RTX4090 and this helps a lot!
Hi there! I'm having a similar issue with the the training result being NaN and only the first picture generating a result. I'm currently using a Windows computer, and I believe Nvidia-TensorFlow is for Linux machines. Do you have any suggestions for this? Thank you so much!
same nan. using cpu with the same speed with gpu.
same nun problem. Just delete the model you trained first. And It will be solve.
same nun problem. Just delete the model you trained first. And It will be solve.
What do you mean by deleting the model I trained first?
nan nan nan nan nan nan nan nan nan nan nan nan nan 1.415068 1.325778 1.373514 nan
seams like still bugs
I use only 100 pictures in this three:1.415068 1.325778 1.373514 others used 10000 and it return nan
Then I delete all the .result and weight and start the traning again and here is the output:
Epoch 01/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.283717 (0:00:12) Valid loss: 1.149827 (0:00:01) Best model! Epoch 02/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.346064 (0:00:15) Valid loss: 1.299754 (0:00:01) Epoch 03/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.339179 (0:00:05) Valid loss: 2.202006 (0:00:01) Epoch 04/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.382328 (0:00:05) Valid loss: 2.739692 (0:00:01) Epoch 05/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.381477 (0:00:05) Valid loss: 1.158773 (0:00:01) Epoch 06/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.343635 (0:00:05) Valid loss: 1.213211 (0:00:01) Epoch 07/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.419941 (0:00:05) Valid loss: 1.211412 (0:00:01) Epoch 08/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.429422 (0:00:05) Valid loss: 1.240724 (0:00:01) Epoch 09/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.385073 (0:00:05) Valid loss: 1.238499 (0:00:01) Epoch 10/20 [====================] 100/100 (ETA: 0:00:00) Train loss: 1.423644 (0:00:05) Valid loss: 1.060307 (0:00:01) Best model! Epoch 11/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:18) Valid loss: nan (0:00:01) Epoch 12/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 13/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 14/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 15/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 16/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 17/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 18/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 19/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:04) Valid loss: nan (0:00:01) Epoch 20/20 [====================] 100/100 (ETA: 0:00:00) Train loss: nan (0:00:05) Valid loss: nan (0:00:01)
No code changes have been made, and the version of the library is compliant This is the result of my training Only the first picture was correct when the 10 pictures were tested Thanks to reassure!