WongKinYiu / PyTorch_YOLOv4

PyTorch implementation of YOLOv4
1.86k stars 585 forks source link

different result for U3 (master) and U5 #271

Open jaqub-manuel opened 3 years ago

jaqub-manuel commented 3 years ago

Hey dear @WongKinYiu

I get very different results for u3 master branch and u5 with exact same models (for example yolov4-pacsp-s-mish.cfg with weights and yolov4s-mish.yaml and weights). F1 is 65 for Yolov4-u3 or and 85 for u5. I also edit the required places in the cfg file with class = 80 (changed to 1) and filters = 255 (changed to 18) What do you think is the problem? Maybe U3 master branch dont do transfer learning (do not have pt, but I load .weights). By the way I have 1000 images for train, 300 for test and single class. Many thanks already.

WongKinYiu commented 3 years ago

for master branch, if you would like use .weights file as pretrained weights, the code should be nodified.

change this line for both .pt and .weights https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/train.py#L60

for .weights, you should use following code to load weights

    model = Darknet(opt.cfg).to(device)
    load_darknet_weights(model, weights[0])

you may also need to add cutoff here https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/models/models.py#L400-L403

jaqub-manuel commented 3 years ago

@WongKinYiu , Thanks for reply. I will change, but I am confused about cutoff, How to change this (for model s, m, l and x) ?

WongKinYiu commented 3 years ago

you could set cutoff = index of last conv layer before the first yolo layer.

jaqub-manuel commented 3 years ago

Dear, @WongKinYiu, I did what they said but it didn't work. First of all, I fixed the 60th line in the train, I got an error, then I fixed it as you said models still have errors. Please can you help me, without transfer learning the performance of the model is very low. thanks in advance . 2 3 1

WongKinYiu commented 3 years ago
    model = Darknet(opt.cfg).to(device)
    load_darknet_weights(model, weights[0])
jaqub-manuel commented 3 years ago

Dear @WongKinYiu , sorry for the inconvenience. still not working. I have tried both: python train.py --weights yolov4.weights --cfg cfg/yolov4.cfg python train.py --weights yolov4.conv.137--cfg cfg/yolov4.cfg

2 1

Thanks for answers.

WongKinYiu commented 3 years ago
    model = Darknet(opt.cfg).to(device)
    load_darknet_weights(model, weights)
jaqub-manuel commented 3 years ago

now, it looks like work, let me now train whole data and give results for both u5 and master branch. by the way, fisrt it gives error like this, Model Summary: 327 layers, 6.39377e+07 parameters, 6.39377e+07 gradients, 141.4 GFLOPS Traceback (most recent call last): File "train.py", line 442, in train(hyp, opt, device, tb_writer) File "train.py", line 65, in train if pretrained: NameError: name 'pretrained' is not defined

then, I recomment pretrained, it works, hope it is right. So last modifiyed code like this:

pretrained = weights.endswith('.pt')
model = Darknet(opt.cfg).to(device)
load_darknet_weights(model, weights)

Thanks dear @WongKinYiu

WongKinYiu commented 3 years ago

temporally use pretrained = True is better.

jaqub-manuel commented 3 years ago

Dear, @WongKinYiu

I did it the way you said and I am now sharing my results. The custom dataset mAP value I trained for U5 is between 80-82, For U3, before doing what you said the map was between 67-70, after doing what you said the mAP value is between 72-74. I still do not understand why the results between U3 and U5 are so different and why. (I trained same model, dataset, batch-size and all other parameters were the same for both branches).

To solve these, I trained from scratch with MSCOCO for both U5 and U3 (master) then use these weighst for custom and I am sharing the results. Below figure depicts mscoco results (first is U5, second is U3).

results_U5

results

I used the same model, yolov4-s-mish.yaml and yolov4-pacsp-s-mish.cfg and 150 epoks and training from scratch for both. I got the same results for U5 as you, but for U3 I got 2 percent lower. (maybe I should have trained more than 150 epochs). Later, when I tried these two weights (new weights on mscoco .pt) on the custom dataset with same pc and environemnt for 200 epoks and by training at least 3 times (.pt weights in U3 and .pt weights in u5), again with 80-82 mAP in U5, results in U3 (master) It was between 72-74. In other words, there is a difference of 10 values. What do you think might be caused by this? In the .cfg file I just reduced the class count from 80 to 1 and the filter count from 255 to 18. I haven't made any other changes. First figure is U5 for my custom dataset, and second is U3 (master) for my custom dataset.

u5 results

Thanks for your help and patience in advance.