Closed jianjiandandande closed 4 years ago
And an hour later, the total loss is nan or -nan ,the IOU and GIOU is equals to zero
Tensor Cores are disabled until the first 3000 iterations are reached.
Loaded: 0.000045 seconds
2296: -nan, -nan avg loss, 0.002610 rate, 2.130433 seconds, 18368 images, 242.507999 hours left
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 5, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 3, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 8, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 14, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 3, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 5, class_loss = -nan, iou_loss = nan, total_loss = nan
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan
Has anyone ever encountered it and knows how to solve it
Attach your cfg-file. Do you use the latest Darknet version?
./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.2.0
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer filters size/strd(dil) input output
I experienced the same, see my issue: https://github.com/AlexeyAB/darknet/issues/5796 I updated the darknet around 2 weeks ago. I am at 480b3eccb19ac65508e49fb3116ccda65551bb6e
I was able to solve (workaround) using lower learning rate (from 0.00261 to 0.0001, but that is not a good idea I think.
Thank you so much. And I found my error and now the program is working fine
Tensor Cores are used.
MJPEG-stream sent.
Loaded: 0.000028 seconds
(next mAP calculation at 26000 iterations)
20480: 2.514878, 1.917477 avg loss, 0.000010 rate, 5.004909 seconds, 327680 images, 36.478363 hours left
known client: 94, sent = 181611, must be sent outlen = 181611
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000002, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000155, iou_loss = 0.000000, total_loss = 0.000155
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000022, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000489, iou_loss = 0.000000, total_loss = 0.000489
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.732398, GIOU: 0.727750), Class: 0.999454, Obj: 0.022924, No Obj: 0.000345, .5R: 1.000000, .75R: 0.000000, count: 1, class_loss = 0.978147, iou_loss = 0.204143, total_loss = 1.182290
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000001, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000046, iou_loss = 0.000000, total_loss = 0.000046
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.886560, GIOU: 0.885944), Class: 0.998898, Obj: 0.697704, No Obj: 0.000872, .5R: 1.000000, .75R: 1.000000, count: 7, class_loss = 1.195309, iou_loss = 34.324028, total_loss = 35.519337
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.846497, GIOU: 0.843830), Class: 0.999632, Obj: 0.664541, No Obj: 0.003064, .5R: 1.000000, .75R: 1.000000, count: 8, class_loss = 1.307590, iou_loss = 12.483640, total_loss = 13.791230
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000000, iou_loss = 0.000000, total_loss = 0.000000
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.830864, GIOU: 0.830864), Class: 0.999720, Obj: 0.814925, No Obj: 0.000211, .5R: 1.000000, .75R: 1.000000, count: 1, class_loss = 0.091153, iou_loss = 4.306364, total_loss = 4.397517
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.802422, GIOU: 0.794730), Class: 0.999229, Obj: 0.958119, No Obj: 0.001257, .5R: 1.000000, .75R: 1.000000, count: 2, class_loss = 0.473928, iou_loss = 1.875301, total_loss = 2.349229
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000001, iou_loss = 0.000000, total_loss = 0.000001
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000007, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000016, iou_loss = 0.000000, total_loss = 0.000016
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.723415, GIOU: 0.723415), Class: 0.999647, Obj: 0.136280, No Obj: 0.000853, .5R: 1.000000, .75R: 0.000000, count: 2, class_loss = 1.512709, iou_loss = 0.768783, total_loss = 2.281492
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000001, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000004, iou_loss = 0.000000, total_loss = 0.000004
What was the error and the solution? My issue is still unanswered.
What was the error and the solution? My issue is still unanswered.
I used my own data, I have a problem in my configuration file yolo-obj.cfg ,after the change is good
I experienced the same, see my issue: https://github.com/AlexeyAB/darknet/issues/5796 I updated the darknet around 2 weeks ago. I am at 480b3eccb19ac65508e49fb3116ccda65551bb6e
As I see 480b3eccb19ac65508e49fb3116ccda65551bb6e is 8 Aug 2019 - it isn't a 2 weeks ago )
Sorry. This one: 08bc0c9373158da6c42f11b1359ca2c017cef1b5 Can we continue on my issue?
@jianjiandandande I meet the same question as you, how can i resolve this question
some info:
DEBUG=1
CUDA-version: 10000 (10010), cuDNN: 7.5.0, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 3.2.0d
yolo-obj-phone
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 64, time_steps = 1, train = 1
my question: Tensor Cores are disabled until the first 3000 iterations are reached.
1: 3755.799805, 3755.799805 avg loss, 0.000000 rate, 17.314848 seconds, 64 images, -1.000000 hours left Loaded: 0.000023 seconds v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.283766, GIOU: 0.089142), Class: 0.617166, Obj: 0.411237, No Obj: 0.457992, .5R: 0.000000, .75R: 0.000000, count: 4, class_loss = 8404.928711, iou_loss = 11.506836, total_loss = 8416.435547 v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.341452, GIOU: 0.329254), Class: 0.526056, Obj: 0.524379, No Obj: 0.485005, .5R: 0.200000, .75R: 0.000000, count: 5, class_loss = 2323.561035, iou_loss = 12.141846, total_loss = 2335.702881 v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.473806, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 565.948242, iou_loss = 0.000000, total_loss = 565.948242 total_bbox = 416, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.352242, GIOU: 0.145135), Class: 0.502178, Obj: 0.376156, No Obj: 0.457373, .5R: 0.250000, .75R: 0.000000, count: 4, class_loss = 8386.804688, iou_loss = 14.726562, total_loss = 8401.531250
after,appear nan:
Tensor Cores are disabled until the first 3000 iterations are reached.
13: -nan, -nan avg loss, 0.000000 rate, 12.039471 seconds, 832 images, 456.033195 hours left Loaded: 0.000060 seconds v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan total_bbox = 4244, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan total_bbox = 4245, rewritten_bbox = 0.000000 %
Train my custom dataset with yolov4, the total loss is very large, the GIOU and the IOU are very small