Should we expect to see an increase in performance when ensembling models of the same framework?

matt-sharp commented 3 years ago

I have trained multiple models using exactly the same YOLO framework and parameters. I have then created an ensemble using each of the methods - affirmative, consensus and unanimous. I've done this for both 3 and 5 models. I find the best ensemble F1-score is no better than the best score from the individual models.

Here are my results:

The YOLO framework uses some random data augmentation of saturation, exposure, hue and mosaic during training. This contributes to the variation in performance across the individual models. Here is my config file:

exp9_yolov4.txt

I'm using the HRSC2016 dataset which contains the following:

train: 436 images including 1207 samples valid: 181 images including 541 samples test: 444 images including 1228 samples

Please can you help me to understand why I don't see an improvement in performance?

ancasag commented 3 years ago

Hello Matthew,

How are you obtaining the metrics? I guess that for the yolo models, you are using directly the darknet framework. We have noticed that in some cases that the predictions produced with yolo models loaded in opencv (as we do in our library) might differ from those obtained in darknet. So, you should check that in order to ensure that the behaviour is the same. Maybe that is what is happening. Try obtaining the F1-score of the models directly instead of using the darknet framework and let us know if that helps.

Best regards, Ángela

De: Matthew Sharp @.> Enviado: miércoles, 17 de marzo de 2021 0:10 Para: ancasag/ensembleObjectDetection @.> Cc: Subscribed @.***> Asunto: [ancasag/ensembleObjectDetection] Should we expect to see an increase in performance when ensembling models of the same framework? (#24)

I have trained multiple models using exactly the same YOLO framework and parameters. I have then created an ensemble using each of the methods - affirmative, consensus and unanimous. I've done this for both 3 and 5 models. I find the best ensemble F1-score is no better than the best score from the individual models.

Here are my results:

[image]https://user-images.githubusercontent.com/13571774/111391531-b428c600-86ac-11eb-9d3b-e4c74115935b.png

The YOLO framework uses some random data augmentation of saturation, exposure, hue and mosaic during training. This contributes to the variation in performance across the individual models. Here is my config file:

exp9_yolov4.txthttps://github.com/ancasag/ensembleObjectDetection/files/6152655/exp9_yolov4.txt

I'm using the HRSC2016 dataset which contains the following:

train: 436 images including 1207 samples valid: 181 images including 541 samples test: 444 images including 1228 samples

Please can you help me to understand why I don't see an improvement in performance?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ancasag/ensembleObjectDetection/issues/24, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH7ZO3B2V5NWH3ODYUDTHOLTD7QPVANCNFSM4ZJPMVIA.

matt-sharp commented 3 years ago

@ancasag thanks for your reply. Yes, I'm calculating the metrics using the darknet map command which uses confidence = 0.25 and IOU = 0.5. I see that you use NMS threshold = 0.3 whereas darknet uses 0.45. Could this be the issue? Sorry, I'm not sure what you mean exactly by obtaining the F1-score directly instead of using darknet?

AmokraneMancer commented 3 years ago

@matt-sharp, darknet uses confidence = 0.005 for calculating mAP. see this reply from Alexey AB

KAKAROT12419 commented 6 months ago

i am unable ensemble model can you help with how you creating ensemble model from this github directory . i have pretrained yolo and retinanet model i want to create ensemble how to create it.

ancasag / ensembleObjectDetection

Should we expect to see an increase in performance when ensembling models of the same framework? #24