AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

Worse results after repository update #5823

Open gilmartinspinheiro opened 4 years ago

gilmartinspinheiro commented 4 years ago

Hi @AlexeyAB ,

I have two different repositories, one is older than the other (cannot specify how old, but prior to the yoloV4 update).

I was not getting acceptable results with yolov4, even though I was using the same dataset that was used for a past successful yolov3 train.

The next step I tried, was to reproduce the successful train done with the old repository, now on the updated one. So, I copied the yolov3 config used in the past and trained with it in the new repository, while maintaining the dataset. What happened was that I could not achieve the same results as the ones obtained before.

Is there any difference beyond the config file and the dataset that could potentially cause this difference?

Thanks in advance

AlexeyAB commented 4 years ago

What version of Darknet do you use currently, what date? Show chart.png with Loss and mAP for all cases. What mAP did you get previously and now?

gilmartinspinheiro commented 4 years ago

Due to logisitic and timing reasons, the trains were done without validation. I will repeat both and try to provide you with full details in a few days, when the new trains are complete. Thank you for your time!

gilmartinspinheiro commented 4 years ago

Hi again @AlexeyAB,

Sorry for the late reply, but I have been conducting some tests on my costum dataset and they took some time.

So, basically I have tested with 3 different repositories:

  1. The most recent version of darknet (lets call it Alexey_new)
  2. An old repo version, on this link: https://github.com/AlexeyAB/darknet/tree/a7a2e1bb4b0efa55ac2af91358e8c8d2d20076a7 . Also, this version did not have the draw MAP functionality, so I wont be able to provide you with that. (lets call it Alexey_old)
  3. An old pjreddie repo - the one I cannot specify the date, although it is surely close to pjreddie latest commit. This version also does not have the draw MAP feature. (lets call it pjreddie_old)

All of the following results were obtained with the same train data and test data (1122 imgs) . I will present you the results for a yolov3 default config file for the 3 repos. Also, I will show you results from a yolov4-custom default config. All trains were done with an excessively large max batch size to reproduce training conditions, since the first train was done that way by accident.

For Alexey_new with yolov3:

The charts are the following (it was necessary to stop the training and restart later, so there are 2 charts): chart_1 chart_2

For Alexey_old yolov3:

For pjreddie_old yolov3:

For Alexey_new yolov4:

The charts are the following: chart_yolov4-custom

gilmartinspinheiro commented 4 years ago

yolov4.txt

gilmartinspinheiro commented 4 years ago

yolov3.txt

AlexeyAB commented 4 years ago

For Alexey_new with yolov3: For Alexey_old yolov3: For pjreddie_old yolov3: For Alexey_new yolov4:

gilmartinspinheiro commented 4 years ago

No. What I basically did was using the same config across different repos and obtained different results in each one of them. Obviously, when using yolov4, the only repo I could use was the more recent one.

1122 testing images.

Sorry, but I am not allowed to share any images from my dataset, unfortunately. But I have already ran the training code with -show_imgs and I have already checked the BBoxes.

That is not the case. The valid and train txt's are correct. obj.data: classes = 12 train = weights/train_name/train.txt validation = weights/train_name/val.txt names = weights/train_name/obj.names backup = weights/train_name/backup/

I actually noticed that I had an error in .data file (wrote "validation=" instead of "valid=" and the validation dataset was defaulting to the training dataset). With that said, that error existed in both cases, for yolov3 and v4. This can be seen in the last image of this comment.

With the most recent repo: calc_anch

I also had around 40 wrong labels (negative values, they are discarded right?) in 25k training images, which I guess is not particularly problematic (?). calc_anch0

With the alexeyAB_old repo: clusters_screenshot_08 06 2020

It is not only the script, the lower performance can also be seen by inspecting the yolov4 and v3 predictions.

So i guess that might actually be the reason for yolov4 poor results (?)

I cannot increase batch size due to memory limitations. Since yolov4 occupies more memory, I had the necessity to reduce the batch size.

gilmartinspinheiro commented 4 years ago

I found another error in the train file, it had half the lines deleted. I will train Yolov4 again with the correct train file, alongside your suggestions and report the result back to you!

Thank you for your help!

AlexeyAB commented 4 years ago

Yes, it looks like you are using different datasets for training/validation for v3 and v4.

Train and test both v3 and v4 on the same dataset with the same command.

gilmartinspinheiro commented 4 years ago

Hello again @AlexeyAB ,

So, I made new trains with the most recent repository (at the time, 7 days ago. After the mosaic fix, i guess).

For my validation script results: FP | FN 701 | 1871

yolo-v4.txt

For my validation script results: FP | FN 19 | 54

yolo-v3.txt

AlexeyAB commented 4 years ago

The problem is that you are training models with different parameters.

gilmartinspinheiro commented 4 years ago

How can I use the same anchors? Are they not set up differently in v4 and v3? At least, mask order is different. Can I copy the masks and anchors directly from v3 to v4 without a problem?

AlexeyAB commented 4 years ago

Don't change masks. Use anchors anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401

gilmartinspinheiro commented 4 years ago

Should I use those anchors on v3 also? Because those are the default on V4, but they are different from V3. Sorry, but I did not understand what you wat me to do regarding the config files.

AlexeyAB commented 4 years ago

The anchors in your yolov3.txt file are different from the default anchors from yolov3.cfg

gilmartinspinheiro commented 4 years ago

@AlexeyAB I am still confused. So, to sum up: You are suggesting I should train yolo v3 and v4 with all default parameters for each one? Including default anchors for each one?

Or are you suggesting that I use yolov4 anchors (not changing the masks) on yolov3?