Open tingyangsh opened 4 years ago
Do you use same learning rate and pre-trained weights?
Yes, the same training parameters and data are used. But I have no use to use pre-trained weights for them
original gaussian yolo uses very low learning rate https://github.com/jwchoi384/Gaussian_YOLOv3/blob/master/cfg/Gaussian_yolov3_BDD.cfg#L18
and i have not train gaussian yolo success without pre-trained weights (usually get nan in 200 steps).
gaussian yolo may run into nan if training set have empty annotation files. https://github.com/AlexeyAB/darknet/issues/4455#issuecomment-564333775
@tingyangsh Hello, I met the same question when I train the model with the Gaussian yolo layer with the newest darknet repo, I have the similiar message like this: "Warning: in txt-labels class_id=193 >= classes=5 in cfg-file. In txt-labels class_id should be [from 0 to 4] truth.x = 0.000951, truth.y = 148378222592.000000, truth.w = 18492160923342970278921038200832.000000, truth.h = 15513.363281, class_id = 193" But if I train the same model with the repo when I downloaded in 2020.02.20 in the same computer, everything is OK, is it a bug or something else? @WongKinYiu @AlexeyAB My GPU is Geforce RTX 2080Ti, Cuda version is 10.1.243, Cudnn version is 7.6.2 thanks
Update: Today I did some experiments about this training problem. I download the repo of https://github.com/AlexeyAB/darknet/releases/tag/darknet_yolo_v4_pre (release 2020.05.15) and train the model with Gaussian_Yolo layer. Everything is OK, no NAN info and the MAP value is normal. I also compare "gaussian_yolo_layer.c" between the repo 20200515 and the newest repo and there is little difference between them, I replaced the new "gaussian_yolo_layer.c" with the version 20200515, after make clean & make, I try to train again, but the problem is still existed. So My infer that the reason raised this problem may happened after 20200515, and It's not raised by the file "gaussian_yolo_layer.c". So I want to know, is't normal when you train with model with Gaussian_Yolo layer with the newest repo? thank you very much! @WongKinYiu @AlexeyAB
@tingyangsh Hello, I met the same question when I train the model with the Gaussian yolo layer with the newest darknet repo, I have the similiar message like this: "Warning: in txt-labels class_id=193 >= classes=5 in cfg-file. In txt-labels class_id should be [from 0 to 4] truth.x = 0.000951, truth.y = 148378222592.000000, truth.w = 18492160923342970278921038200832.000000, truth.h = 15513.363281, class_id = 193" But if I train the same model with the repo when I downloaded in 2020.02.20 in the same computer, everything is OK, is it a bug or something else? @WongKinYiu @AlexeyAB My GPU is Geforce RTX 2080Ti, Cuda version is 10.1.243, Cudnn version is 7.6.2 thanks
Hi ,lq0104,I met the same question with you ,May I share your repo when you downloaded in 2020.02.20 ,I am struggling to make gaussian yolo work ,may e-mai is :zhang_wenchao1@163.com thank you very much !
Update: Today I did some experiments about this training problem. I download the repo of https://github.com/AlexeyAB/darknet/releases/tag/darknet_yolo_v4_pre (release 2020.05.15) and train the model with Gaussian_Yolo layer. Everything is OK, no NAN info and the MAP value is normal. I also compare "gaussian_yolo_layer.c" between the repo 20200515 and the newest repo and there is little difference between them, I replaced the new "gaussian_yolo_layer.c" with the version 20200515, after make clean & make, I try to train again, but the problem is still existed. So My infer that the reason raised this problem may happened after 20200515, and It's not raised by the file "gaussian_yolo_layer.c". So I want to know, is't normal when you train with model with Gaussian_Yolo layer with the newest repo? thank you very much! @WongKinYiu @AlexeyAB
When I use the pro of "darknet_yolo_v4_pre" you shared to train gaussian_yolo ,it does not use GPU , Do you meet the same things ? thank you .
Part of the code in Gaussian-test.cfg: [convolutional] 174 size=1 175 stride=1 176 pad=1 177 filters=42 178 activation=linear 179 180 [Gaussian_yolo] 181 mask = 0,1,2 182 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 183 classes=5 184 num=6 185 jitter=.3 186 ignore_thresh = .7 187 truth_thresh = 1 188 iou_thresh=0.213 189 uc_normalizer=0.01 190 iou_normalizer=0.01 191 cls_normalizer=1.0 192 #iou_loss=ciou 193 scale_x_y = 1.2 194 random=1
My training command is ./darknet detector train cfg/voc_hrsc5.data cfg/Gaussian-test.cfg. Some error messages are as follows: Region 23 Avg IOU: 0.000000, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, uc_loss = -nan, total_loss = -nan Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0, class_loss = 0.00, iou_loss = 0.00, uc_loss = 0.00 , total_loss = 0.00 Region 23 Avg IOU: 0.000000, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = -nan, iou_loss = -nan, uc_loss = -nan, total_loss = -nan
Tensor Cores are disabled until the first 3000 iterations are reached.
72: -nan, -nan avg loss, 0.000000 rate, 0.353376 seconds, 4608 images, 105.484449 hours left Loaded: 0.000025 seconds
Warning: in txt-labels class_id=240567840 >= classes=5 in cfg-file. In txt-labels class_id should be [from 0 to 4] truth.x = 0.000000, truth.y = 0.000003, truth.w = 0.000001, truth.h = 274706242392578588672.000000, class_id = 240567840
Warning: in txt-labels class_id=193 >= classes=5 in cfg-file. In txt-labels class_id should be [from 0 to 4] truth.x = 0.000951, truth.y = 148378222592.000000, truth.w = 18492160923342970278921038200832.000000, truth.h = 15513.363281, class_id = 193