Open gilmartinspinheiro opened 4 years ago
What version of Darknet do you use currently, what date? Show chart.png with Loss and mAP for all cases. What mAP did you get previously and now?
Due to logisitic and timing reasons, the trains were done without validation. I will repeat both and try to provide you with full details in a few days, when the new trains are complete. Thank you for your time!
Hi again @AlexeyAB,
Sorry for the late reply, but I have been conducting some tests on my costum dataset and they took some time.
So, basically I have tested with 3 different repositories:
All of the following results were obtained with the same train data and test data (1122 imgs) . I will present you the results for a yolov3 default config file for the 3 repos. Also, I will show you results from a yolov4-custom default config. All trains were done with an excessively large max batch size to reproduce training conditions, since the first train was done that way by accident.
For Alexey_new with yolov3:
The charts are the following (it was necessary to stop the training and restart later, so there are 2 charts):
For Alexey_old yolov3:
For pjreddie_old yolov3:
For Alexey_new yolov4:
The charts are the following:
For Alexey_new with yolov3: For Alexey_old yolov3: For pjreddie_old yolov3: For Alexey_new yolov4:
Did you train on New repo for all 4 cases, and only tested on 3 different repos?
How many test images do you have?
Run training with flag -show_imgs
do you see correct bboxes? Can you show 1-2 examples?
Show content of obj.data
file
It looks like you trained Yolov3 with valid=train.txt, while trained Yolov4 with valid=test.txt in obj.data file
Show anchors and cloud of points by using command:
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 576 -height 576 -show
I don't know anything about your testing script, but this is strange that AP50 is lower for v4 than for v3.
There was some issue with mosaic=1 from 1 Jun 2020 to 7 Jun 2020, so if you used this version, try to download the latest Darknet version and train yolov4 again
Also try to set subdivisions=32 or better 16 in cfg-file, and show chart.png
screenshots with such information
./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg
CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.2.0
0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF
No. What I basically did was using the same config across different repos and obtained different results in each one of them. Obviously, when using yolov4, the only repo I could use was the more recent one.
1122 testing images.
Sorry, but I am not allowed to share any images from my dataset, unfortunately. But I have already ran the training code with -show_imgs and I have already checked the BBoxes.
That is not the case. The valid and train txt's are correct. obj.data: classes = 12 train = weights/train_name/train.txt validation = weights/train_name/val.txt names = weights/train_name/obj.names backup = weights/train_name/backup/
I actually noticed that I had an error in .data file (wrote "validation=" instead of "valid=" and the validation dataset was defaulting to the training dataset). With that said, that error existed in both cases, for yolov3 and v4. This can be seen in the last image of this comment.
With the most recent repo:
I also had around 40 wrong labels (negative values, they are discarded right?) in 25k training images, which I guess is not particularly problematic (?).
With the alexeyAB_old repo:
It is not only the script, the lower performance can also be seen by inspecting the yolov4 and v3 predictions.
So i guess that might actually be the reason for yolov4 poor results (?)
I cannot increase batch size due to memory limitations. Since yolov4 occupies more memory, I had the necessity to reduce the batch size.
I found another error in the train file, it had half the lines deleted. I will train Yolov4 again with the correct train file, alongside your suggestions and report the result back to you!
Thank you for your help!
Yes, it looks like you are using different datasets for training/validation for v3 and v4.
Train and test both v3 and v4 on the same dataset with the same command.
Hello again @AlexeyAB ,
So, I made new trains with the most recent repository (at the time, 7 days ago. After the mosaic fix, i guess).
The results are still far better for yolo V3 than yolo v4.
Also, in yolo V3, are now closer to the results of the old repo (which I called pjreddie_old), although they are still worse and the config was now changed for optimal performance. That is, I used more data augmentation, trained anchors and also a smaller max iteration number.
The cfg's will be attached for you to check them, as well as the charts.
How can I train anchors for yolov4? Is the process and the cautions to have the same as before?
yolo V4 Chart:
For my validation script results: FP | FN 701 | 1871
For my validation script results: FP | FN 19 | 54
The problem is that you are training models with different parameters.
Is this accuracy on training or validation dataset?
Train both models with the same anchors, and the same random=0 param, and subdivisions=32 for yolov4.cfg
Show tp, fp, fn for both models using ./darknet detector map ...
command for both training and validation dataset
Is this accuracy on training or validation dataset? Validation dataset.
Train both models with the same anchors, and the same random=0 param, and subdivisions=32 for yolov4.cfg
How can I use the same anchors? Are they not set up differently in v4 and v3? At least, mask order is different. Can I copy the masks and anchors directly from v3 to v4 without a problem?
Don't change masks.
Use anchors
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
Should I use those anchors on v3 also? Because those are the default on V4, but they are different from V3. Sorry, but I did not understand what you wat me to do regarding the config files.
The anchors in your yolov3.txt file are different from the default anchors from yolov3.cfg
@AlexeyAB I am still confused. So, to sum up: You are suggesting I should train yolo v3 and v4 with all default parameters for each one? Including default anchors for each one?
Or are you suggesting that I use yolov4 anchors (not changing the masks) on yolov3?
Hi @AlexeyAB ,
I have two different repositories, one is older than the other (cannot specify how old, but prior to the yoloV4 update).
I was not getting acceptable results with yolov4, even though I was using the same dataset that was used for a past successful yolov3 train.
The next step I tried, was to reproduce the successful train done with the old repository, now on the updated one. So, I copied the yolov3 config used in the past and trained with it in the new repository, while maintaining the dataset. What happened was that I could not achieve the same results as the ones obtained before.
Is there any difference beyond the config file and the dataset that could potentially cause this difference?
Thanks in advance