AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.73k stars 7.96k forks source link

Train my custom dataset with yolov4, the loss is very large, the GIOU and the IOU are very small #5798

Closed jianjiandandande closed 4 years ago

jianjiandandande commented 4 years ago

Train my custom dataset with yolov4, the total loss is very large, the GIOU and the IOU are very small

[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
Total BFLOPS 128.459 
avg_outputs = 1068395 
 Allocate additional workspace_size = 59.72 MB 
Loading weights from ../yolov4.conv.137...0
yolov4
net.optimized_memory = 0 
mini_batch = 1, batch = 8, time_steps = 1, train = 1 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 
nms_kind: greedynms (1), beta = 0.600000 
Done! Loaded 137 layers from weights-file 
 Create 6 permanent cpu-threads 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.425788, GIOU: 0.320237), Class: 0.552688, Obj: 0.528173, No Obj: 0.478972, .5R: 0.250000, .75R: 0.000000, count: 8, class_loss = 9253.235352, iou_loss = 45.365234, total_loss = 9298.600586 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.361636, GIOU: 0.240765), Class: 0.454532, Obj: 0.485311, No Obj: 0.482756, .5R: 0.214286, .75R: 0.000000, count: 14, class_loss = 2605.862793, iou_loss = 18.572266, total_loss = 2624.435059 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.420738, GIOU: 0.400683), Class: 0.438267, Obj: 0.450818, No Obj: 0.474730, .5R: 0.666667, .75R: 0.000000, count: 3, class_loss = 640.989746, iou_loss = 0.659851, total_loss = 641.649597 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.424423, GIOU: 0.362076), Class: 0.440136, Obj: 0.530715, No Obj: 0.478456, .5R: 0.333333, .75R: 0.000000, count: 9, class_loss = 9266.964844, iou_loss = 62.067383, total_loss = 9329.032227 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.389297, GIOU: 0.359481), Class: 0.445149, Obj: 0.472654, No Obj: 0.481627, .5R: 0.250000, .75R: 0.000000, count: 12, class_loss = 2562.528076, iou_loss = 18.902588, total_loss = 2581.430664 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.494276, GIOU: 0.449161), Class: 0.504789, Obj: 0.576330, No Obj: 0.474601, .5R: 0.666667, .75R: 0.000000, count: 3, class_loss = 635.924622, iou_loss = 0.962830, total_loss = 636.887451 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.479590, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 9129.406250, iou_loss = 0.000000, total_loss = 9129.406250 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.482930, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 2320.638428, iou_loss = 0.000000, total_loss = 2320.638428 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.420100, GIOU: 0.345474), Class: 0.539285, Obj: 0.500229, No Obj: 0.474810, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = 615.674194, iou_loss = 0.144592, total_loss = 615.818787 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.119087, GIOU: -0.105577), Class: 0.329427, Obj: 0.461046, No Obj: 0.479669, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 9132.197266, iou_loss = 0.265625, total_loss = 9132.462891 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.370581, GIOU: 0.181333), Class: 0.422291, Obj: 0.470932, No Obj: 0.483206, .5R: 0.250000, .75R: 0.250000, count: 4, class_loss = 2402.878662, iou_loss = 4.118408, total_loss = 2406.997070 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.483585, GIOU: 0.470041), Class: 0.473558, Obj: 0.509166, No Obj: 0.476478, .5R: 0.250000, .75R: 0.000000, count: 4, class_loss = 660.991577, iou_loss = 0.777771, total_loss = 661.769348 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.433049, GIOU: 0.376566), Class: 0.438415, Obj: 0.524753, No Obj: 0.479196, .5R: 0.333333, .75R: 0.000000, count: 3, class_loss = 9214.897461, iou_loss = 31.991211, total_loss = 9246.888672 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.103155, GIOU: 0.103155), Class: 0.260479, Obj: 0.433206, No Obj: 0.481584, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 2334.124512, iou_loss = 0.158447, total_loss = 2334.282959 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.383599, GIOU: 0.320343), Class: 0.453084, Obj: 0.453337, No Obj: 0.474065, .5R: 0.333333, .75R: 0.000000, count: 3, class_loss = 637.607483, iou_loss = 0.431580, total_loss = 638.039062 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.571060, GIOU: 0.470602), Class: 0.464704, Obj: 0.525286, No Obj: 0.479938, .5R: 0.500000, .75R: 0.500000, count: 2, class_loss = 9211.096680, iou_loss = 18.393555, total_loss = 9229.490234 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.464048, GIOU: 0.403799), Class: 0.411765, Obj: 0.553470, No Obj: 0.481672, .5R: 0.400000, .75R: 0.200000, count: 5, class_loss = 2408.546387, iou_loss = 6.772705, total_loss = 2415.319092 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.508498, GIOU: 0.451606), Class: 0.435614, Obj: 0.501434, No Obj: 0.472517, .5R: 0.500000, .75R: 0.000000, count: 2, class_loss = 612.748962, iou_loss = 0.451965, total_loss = 613.200928 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.477773, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 9035.582031, iou_loss = 0.000000, total_loss = 9035.582031 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.263849, GIOU: 0.154530), Class: 0.431232, Obj: 0.610503, No Obj: 0.482690, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 2336.520264, iou_loss = 0.067627, total_loss = 2336.587891 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.216258, GIOU: -0.153946), Class: 0.547545, Obj: 0.488689, No Obj: 0.474241, .5R: 0.000000, .75R: 0.000000, count: 3, class_loss = 634.727722, iou_loss = 0.063477, total_loss = 634.791199 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.479870, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 9119.888672, iou_loss = 0.000000, total_loss = 9119.888672 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.377818, GIOU: 0.270238), Class: 0.449616, Obj: 0.515882, No Obj: 0.483828, .5R: 0.272727, .75R: 0.000000, count: 11, class_loss = 2544.320068, iou_loss = 11.854736, total_loss = 2556.174805 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.463991, GIOU: 0.412744), Class: 0.544002, Obj: 0.478568, No Obj: 0.476024, .5R: 0.500000, .75R: 0.125000, count: 8, class_loss = 741.389771, iou_loss = 1.664795, total_loss = 743.054565 
jianjiandandande commented 4 years ago

And an hour later, the total loss is nan or -nan ,the IOU and GIOU is equals to zero

 Tensor Cores are disabled until the first 3000 iterations are reached.
Loaded: 0.000045 seconds

 2296: -nan, -nan avg loss, 0.002610 rate, 2.130433 seconds, 18368 images, 242.507999 hours left
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 5, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 3, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 8, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 14, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 3, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 5, class_loss = -nan, iou_loss = nan, total_loss = nan 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan

Has anyone ever encountered it and knows how to solve it

AlexeyAB commented 4 years ago

Attach your cfg-file. Do you use the latest Darknet version?

benkelaci commented 4 years ago

I experienced the same, see my issue: https://github.com/AlexeyAB/darknet/issues/5796 I updated the darknet around 2 weeks ago. I am at 480b3eccb19ac65508e49fb3116ccda65551bb6e

I was able to solve (workaround) using lower learning rate (from 0.00261 to 0.0001, but that is not a good idea I think.

jianjiandandande commented 4 years ago

Thank you so much. And I found my error and now the program is working fine

 Tensor Cores are used.
 MJPEG-stream sent. 
Loaded: 0.000028 seconds

 (next mAP calculation at 26000 iterations) 
 20480: 2.514878, 1.917477 avg loss, 0.000010 rate, 5.004909 seconds, 327680 images, 36.478363 hours left
known client: 94, sent = 181611, must be sent outlen = 181611
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000002, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000155, iou_loss = 0.000000, total_loss = 0.000155 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000022, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000489, iou_loss = 0.000000, total_loss = 0.000489 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.732398, GIOU: 0.727750), Class: 0.999454, Obj: 0.022924, No Obj: 0.000345, .5R: 1.000000, .75R: 0.000000, count: 1, class_loss = 0.978147, iou_loss = 0.204143, total_loss = 1.182290 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000001, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000046, iou_loss = 0.000000, total_loss = 0.000046 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.886560, GIOU: 0.885944), Class: 0.998898, Obj: 0.697704, No Obj: 0.000872, .5R: 1.000000, .75R: 1.000000, count: 7, class_loss = 1.195309, iou_loss = 34.324028, total_loss = 35.519337 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.846497, GIOU: 0.843830), Class: 0.999632, Obj: 0.664541, No Obj: 0.003064, .5R: 1.000000, .75R: 1.000000, count: 8, class_loss = 1.307590, iou_loss = 12.483640, total_loss = 13.791230 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000000, iou_loss = 0.000000, total_loss = 0.000000 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.830864, GIOU: 0.830864), Class: 0.999720, Obj: 0.814925, No Obj: 0.000211, .5R: 1.000000, .75R: 1.000000, count: 1, class_loss = 0.091153, iou_loss = 4.306364, total_loss = 4.397517 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.802422, GIOU: 0.794730), Class: 0.999229, Obj: 0.958119, No Obj: 0.001257, .5R: 1.000000, .75R: 1.000000, count: 2, class_loss = 0.473928, iou_loss = 1.875301, total_loss = 2.349229 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000001, iou_loss = 0.000000, total_loss = 0.000001 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000007, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000016, iou_loss = 0.000000, total_loss = 0.000016 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.723415, GIOU: 0.723415), Class: 0.999647, Obj: 0.136280, No Obj: 0.000853, .5R: 1.000000, .75R: 0.000000, count: 2, class_loss = 1.512709, iou_loss = 0.768783, total_loss = 2.281492 
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000001, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000004, iou_loss = 0.000000, total_loss = 0.000004 
benkelaci commented 4 years ago

What was the error and the solution? My issue is still unanswered.

jianjiandandande commented 4 years ago

What was the error and the solution? My issue is still unanswered.

I used my own data, I have a problem in my configuration file yolo-obj.cfg ,after the change is good

AlexeyAB commented 4 years ago

I experienced the same, see my issue: https://github.com/AlexeyAB/darknet/issues/5796 I updated the darknet around 2 weeks ago. I am at 480b3eccb19ac65508e49fb3116ccda65551bb6e

As I see 480b3eccb19ac65508e49fb3116ccda65551bb6e is 8 Aug 2019 - it isn't a 2 weeks ago )

benkelaci commented 4 years ago

Sorry. This one: 08bc0c9373158da6c42f11b1359ca2c017cef1b5 Can we continue on my issue?

xiaobumiDM commented 4 years ago

@jianjiandandande I meet the same question as you, how can i resolve this question

some info: DEBUG=1 CUDA-version: 10000 (10010), cuDNN: 7.5.0, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1 OpenCV version: 3.2.0d yolo-obj-phone 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070 net.optimized_memory = 0 mini_batch = 1, batch = 64, time_steps = 1, train = 1

my question: Tensor Cores are disabled until the first 3000 iterations are reached.

1: 3755.799805, 3755.799805 avg loss, 0.000000 rate, 17.314848 seconds, 64 images, -1.000000 hours left Loaded: 0.000023 seconds v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.283766, GIOU: 0.089142), Class: 0.617166, Obj: 0.411237, No Obj: 0.457992, .5R: 0.000000, .75R: 0.000000, count: 4, class_loss = 8404.928711, iou_loss = 11.506836, total_loss = 8416.435547 v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.341452, GIOU: 0.329254), Class: 0.526056, Obj: 0.524379, No Obj: 0.485005, .5R: 0.200000, .75R: 0.000000, count: 5, class_loss = 2323.561035, iou_loss = 12.141846, total_loss = 2335.702881 v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.473806, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 565.948242, iou_loss = 0.000000, total_loss = 565.948242 total_bbox = 416, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.352242, GIOU: 0.145135), Class: 0.502178, Obj: 0.376156, No Obj: 0.457373, .5R: 0.250000, .75R: 0.000000, count: 4, class_loss = 8386.804688, iou_loss = 14.726562, total_loss = 8401.531250

after,appear nan:

Tensor Cores are disabled until the first 3000 iterations are reached.

13: -nan, -nan avg loss, 0.000000 rate, 12.039471 seconds, 832 images, 456.033195 hours left Loaded: 0.000060 seconds v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan total_bbox = 4244, rewritten_bbox = 0.000000 % v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, total_loss = -nan total_bbox = 4245, rewritten_bbox = 0.000000 %