Closed getsanjeev closed 5 years ago
@bennycheung Can you please have some time on this?
Unfortunately, with the limited information, I cannot guess what is the cause of your problem. Are you training with your own data set? and your configuration files? This could be the the exploding gradient problem. https://machinelearningmastery.com/exploding-gradients-in-neural-networks/
I am working with your data, have followed the readme,
In your script available, do you have classes from 0 (not 1) as darknet expects?
I am running it on a linux machine, 6GB GC-RAM, GTX1060. The training starts with nan values.
@bennycheung I see you have ensured this class issue. for class 1, it has 0 it the label. So that should not be the issue.
Thanks for the additional info! Did you let it run for a little longer, does the nan value goes away? The other possibility is your graphics card has less RAM. You may need to turn the batch size, so that it does not explode your neural network memory.
Yes I tried with 16 batch size and 4 subdivisions, still same result. Should I allow it to run for a long time? Also there might be some issue related to GPU. I dont think its using GPU memory. Let me see. Thanks!
mask_scale: Using default '1.000000' Loading weights from darknet19_448.conv.23...Done! Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005 Resizing 544 Loaded: 0.228742 seconds Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.434100, Avg Recall: -nan, count: 0