Low mAP when training on aerial images

DINHQuangDung1999 commented 2 months ago

Hi,

First of all, thank you for providing this code library which is very helpful for me to learn Faster RCNN. I have tried your code on a custom split of PascalVOC, and the result was okay in my opinion. I did two experiment with VGG16 backbone trained for 40 epochs and ResNet101 trained for 10 epochs.

However, when I try with aerial images like DOTA, the mAP is quite low, at around 35 mAP@0.5 after 10 epochs of training. Here, I use ResNet101 as the backbone. Moreover, I found that the model usually output a lot of meaningless boxes beside correct boxes (high False Positive Rate), as in the following images.

I wonder if you have ever experience this situation. Thank you in advance.

explainingai-code commented 2 months ago

Hello @DINHQuangDung1999 , Thank you for your appreciation. I have never worked with DOTA dataset so am not sure what the exact issue might be. From the paper(https://arxiv.org/pdf/1711.10398) I see that with HBB ground truths, Faster RCNN gets around 60.46, so the 35% mAP that we are getting seems very less. I assume you are using the same training/evaluation mechanism as the paper(5.2). I do see you have a dataloader which seems to be doing all of that but just wanted to confirm.

Ideally if we are doing all that and also using the same configuration parameters(like aspect ratios of 0.3/0.5/1/2/4, number of proposals e.t.c) from https://github.com/jessemelpolio/Faster_RCNN_for_DOTA/blob/499b32c3893ccd8850e0aca07e5afb952d08943e/experiments/faster_rcnn/cfgs/DOTA.yaml then we should get the same result or atleast close.

DINHQuangDung1999 commented 2 months ago

Hi @explainingai-code , thank you for replying!

It seems weird that when I use VGG16 backbone I get better mAP, at 40mAP after the first epoch and 48mAP after 3 epochs. I think I need to do more experiments.

DINHQuangDung1999 commented 2 months ago

Hi @explainingai-code,

Can you explain why you set the low_score_threshold to 0.7 in the inference code at this line? Is it normal to expect correctly labeled objects having score >0.7 and ideally close to 1?

explainingai-code commented 2 months ago

@DINHQuangDung1999 , there are two different values for low score threshold. One for visualizing detections and other for mAP evaluation. When you run inference on voc dataset, a well trained model should give a high score(> 0.5) for the prediction box that has high IOU with the gt box. But this threshold will be different for different datasets as it depends on the capability of your model and the ability of it to distinguish between different classes of the dataset. So if your dataset has classes that are not easily distinguishable from each other (like maybe small vehicle, large vehicle) then after softmax you would not be able to get such high scoring boxes.

The infer method is just to visualize few sample detections and similar to this method of py-faster-rcnn repo , which uses a higher confidence threshold. The paper also in Figure 5, shows visualisations with 0.6 score threshold, so based on few sample images I found 0.7 worked fine for voc, so used that only for the infer part.

But the score threshold for mAP computation is still a very low one(0.05), same value is used in the pytorch faster rcnn codebase as well as for mAP evaluation in py-faster-rcnn test method , which is what I use in the evaluate_map method in this repo. And is the default value provided in config

explainingai-code commented 2 months ago

Just looked at the DOTA repo and found that they use a much lower threshold, even for visualising. https://github.com/jessemelpolio/Faster_RCNN_for_DOTA/blob/499b32c3893ccd8850e0aca07e5afb952d08943e/faster_rcnn/core/tester.py#L777 You might have to look into their code a bit more to see what exact threshold they use for faster rcnn trained and evaluated on horizontal bounding box.

explainingai-code / FasterRCNN-PyTorch

Low mAP when training on aerial images #2