AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.77k stars 7.96k forks source link

About the Gaussian_yolo_layer #6614

Open JinCho23 opened 4 years ago

JinCho23 commented 4 years ago

I saw in the issue board that some people are struggling to use the [Gaussian_Yolo] layer. https://github.com/AlexeyAB/darknet/issues/6476 In my case, I get NaN or very large loss (according to cfg tuning; ciou/iou) and the training fails, although the same model works nicely with [Yolo] layer. So I've compared "gaussian_yolo_layer.c" and "yolo_layer.c", and found below. Is it a bug or not?

1) truth_size https://github.com/AlexeyAB/darknet/blob/master/src/gaussian_yolo_layer.c#L53-L54 => max_boxes (4+1) https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L45-L46 => max_boxes (4+2) If I change (4+1) of gaussian_yolo_layer to (4+2), it looks like the training becomes more stable. (Lines 54,467,468,499,521,526,533,567, and 600). After changing the constant values, I don't see NaN and the training goes well so far (not finished yet).

2) scaling factor in the backward process https://github.com/AlexeyAB/darknet/blob/master/src/gaussian_yolo_layer.c#L896 => l.delta_normalizer https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L959 => state.net.loss_scale * l.delta_normalizer Why does not the gaussian_yolo_layer have "state.net.loss_scale"?

hsm4703 commented 4 years ago

when i train gaussian_yolov4 and gaussian_yolov4-tiny also can run and obtain great map result about my dataset(not coco data) but gaussian have high current avg loss and sometime current avg loss appear NaN , what type can decrease current avg loss value。 your turth size change max_boxes (4+2) is training finish yet ? after training is better than max_boxes (4+1)?

tuteming commented 4 years ago

Hi, JinCho23: Do you have a complete approach to success? thanks

JinCho23 commented 4 years ago

Hi, JinCho23: Do you have a complete approach to success? thanks

Yeah, as I wrote above, after changing those lines I finally got success to train the model. In my custom dataset, I got slightly lower mAP (~0.x) but with less false positives.

tuteming commented 4 years ago

Hi, JinCho23: in you case, the truth_size must change (4+1) of gaussian_yolo_layer to (4+2). gaussian_yolo_layer need change to "state.net.loss_scale * l.delta_normalizer"? also you got slightly lower mAP (~0.x). About how much? thanks.

JinCho23 commented 4 years ago

Hi, JinCho23: in you case, the truth_size must change (4+1) of gaussian_yolo_layer to (4+2). gaussian_yolo_layer need change to "state.net.loss_scale * l.delta_normalizer"? also you got slightly lower mAP (~0.x). About how much? thanks.

In my dataset, the initial mAP was 94.8 with Yolo, and I got 94.2 with gaussian_yolo. I've also added the loss_scale into the gaussian_yolo. I'm not sure this is correct approach to stable training of gaussian_yolo.

FYI, I still sometimes see NaN if I use different network architectures. I think the fail of training is also related to "max_delta" in cfg file.

wenchao1993 commented 4 years ago

Hi , May I share your cfg files of the gaussian_yolo? Thank you a lot . @JinCho23

WilburZjh commented 4 years ago

when i train gaussian_yolov4 and gaussian_yolov4-tiny also can run and obtain great map result about my dataset(not coco data) but gaussian have high current avg loss and sometime current avg loss appear NaN , what type can decrease current avg loss value。 your turth size change max_boxes (4+2) is training finish yet ? after training is better than max_boxes (4+1)?

Hi @hsm4703 may i know where did you find the gaussian_yolov4 and gaussian_yolov4-tiny?

hsm4703 commented 4 years ago

當我訓練gaussian_yolov4和gaussian_yolov4纖巧也可以運行,並獲取有關我的數據集(不COCO數據)大地圖的結果,但高斯具有高電流平均損耗和某個當前平均損失出現NaN的,什麼類型可以減少當前平均損耗值。 您turth大小更改max_boxes (4 + 2)訓練結束了嗎?訓練後比max_boxes (4 + 1)好嗎?

嗨@ hsm4703我可以知道你在哪裡找到gaussian_yolov4和gaussian_yolov4-tiny嗎?

this is my create .cfg file when i train can get great performanence

WilburZjh commented 4 years ago

Hi @hsm4703 , Thanks for the replying! I still did not get your point, do you mean that you are creating your own gaussian_yolov4? Can you share the cfg with me? I am struggling to make my gaussian yolov4 work...

hsm4703 commented 4 years ago

@ hsm4703您好,感謝您的回复!我仍然沒明白你的意思,你的意思是您正在創建自己的gaussian_yolov4嗎?可以和我分享CFG嗎?我正在努力使我的高斯yolov4工作...

ok i sent to your e-mail

hsm4703 commented 4 years ago

WilburZjh notifications@github.com 於 2020年11月16日 週一 上午4:47寫道:

when i train gaussian_yolov4 and gaussian_yolov4-tiny also can run and obtain great map result about my dataset(not coco data) but gaussian have high current avg loss and sometime current avg loss appear NaN , what type can decrease current avg loss value。 your turth size change max_boxes (4+2) is training finish yet ? after training is better than max_boxes (4+1)?

Hi @hsm4703 https://github.com/hsm4703 may i know where did you find the gaussian_yolov4 and gaussian_yolov4-tiny?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexeyAB/darknet/issues/6614#issuecomment-727634280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKH2H5TWBIU6CUH4NTDBVXTSQA47BANCNFSM4Q2OPHFA .

WilburZjh commented 4 years ago

WilburZjh notifications@github.com 於 2020年11月16日 週一 上午4:47寫道: when i train gaussian_yolov4 and gaussian_yolov4-tiny also can run and obtain great map result about my dataset(not coco data) but gaussian have high current avg loss and sometime current avg loss appear NaN , what type can decrease current avg loss value。 your turth size change max_boxes (4+2) is training finish yet ? after training is better than max_boxes (4+1)? Hi @hsm4703 https://github.com/hsm4703 may i know where did you find the gaussian_yolov4 and gaussian_yolov4-tiny? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6614 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKH2H5TWBIU6CUH4NTDBVXTSQA47BANCNFSM4Q2OPHFA .

@ hsm4703您好,感謝您的回复!我仍然沒明白你的意思,你的意思是您正在創建自己的gaussian_yolov4嗎?可以和我分享CFG嗎?我正在努力使我的高斯yolov4工作...

ok i sent to your e-mail

Thank you so much! my email address is in my profile.

JinCho23 commented 4 years ago

@hsm4703, @WilburZjh / I'm sorry that I can't share my cfg file since it's a property of my company. The Gaussian yolo related parts in my cfg are basically similar to those uploaded in this repository.

I believe you should change the "(4+1)" value of the Gaussian yolo layer to "(4+2)". Please check this out: https://github.com/AlexeyAB/darknet/blob/master/src/data.c#L455-L460 The darknet data loader assigns 6 groundtruth values in the truth array, regardless of Gaussian or not.

WilburZjh commented 4 years ago

Hi @JinCho23 , may i know which weight file you are using for training your gaussian yolo?

JinCho23 commented 4 years ago

Hi @JinCho23 , may i know which weight file you are using for training your gaussian yolo?

I use my custom model. So there is no pretrained weight. If you are using multiple GPUs, then "warm up training" would be useful. Following the guideline in this repository, train first 1000 iterations with a single GPU, then use it as an initial weight for multi-GPU training. I did this before, and definitely it's working for stabilizing the training.

Usually I pretrain my model with the MS COCO dataset. In my case, this helps reduce false positive.

wenchao1993 commented 4 years ago

當我訓練gaussian_yolov4和gaussian_yolov4纖巧也可以運行,並獲取有關我的數據集(不COCO數據)大地圖的結果,但高斯具有高電流平均損耗和某個當前平均損失出現NaN的,什麼類型可以減少當前平均損耗值。 您turth大小更改max_boxes (4 + 2)訓練結束了嗎?訓練後比max_boxes (4 + 1)好嗎?

嗨@ hsm4703我可以知道你在哪裡找到gaussian_yolov4和gaussian_yolov4-tiny嗎?

this is my create .cfg file when i train can get great performanence

Hi,hsm4703 ,may I share your cfg file of gaussian_yolo, I am working to make it work ,my email is : zhang_wenchao1@163.com ,thank you very much .

lsydd commented 2 years ago

Thanks a lot. Darknet works well after I change the code of gaussian_yolo_layer.c according to yolo_layer.c

lsydd commented 2 years ago

I found that ema trick can't be used in network with gaussian_yolo layer. Seems to be another bug.