KaihuaTang / Scene-Graph-Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
MIT License
1.06k stars 229 forks source link

Fine-tuned SGGen model mAP result #204

Open narchitect opened 8 months ago

narchitect commented 8 months ago

Hello everyone,

I hope you can provide some insights on a matter we've been grappling with. We've been working with the pretrained Faster R-CNN model provided in this repository, attempting to fine-tune it for our specific dataset. However, due to the necessity of removing bbox layers when training SGGen, our bbox detection layers end up being trained solely on our dataset, without the benefit of pretrained values. Consequently, our mAP (mean Average Precision) struggles to exceed 10%.

Just to provide some context, our dataset comprises 377 similar images and includes 23 different classes, which, admittedly, doesn't make for an ideal scenario.

As a result, we've observed that the best mAP we could achieve using the SGGen model from this repository is approximately 25%. Given the challenges posed by our less-than-optimal data quality, we believe that achieving an mAP of 12% in fine-tuned models that require bbox detection, like SGGen, is the best we can realistically hope for.

Now, I'd like to reach out to the community to ask if anyone has experience fine-tuning SGGen models and whether they've achieved mAP values higher than 25%. We're particularly interested in understanding if a 10% mAP should be considered acceptable in this context.

Thank you in advance for sharing your insights and experiences. We look forward to your valuable input!

image

Maelic commented 7 months ago

Hi @narchitect ,

The mAP you are referring to here is the performance of your object detection alone (i.e. bounding box regression + classification), I would suggest you to switch from Faster-RCNN to another detector which will be more performant for few-shot settings, which seems to be your case. Faster-RCNN is a pretty old and bad detector at this point, especially for few-shot, using a more recent detector pretrained on a larger dataset such as Swin transformer, DETR, ViT etc will be better. Then you can train a SGGen model by freezing the weights of your object detector and replacing the RPN layers, as I explained it in here. Do not replace the full backbone layers or you will have to change the features extractor as well, which is more complex.

Lxy811 commented 6 months ago

Hi @narchitect ,

The mAP you are referring to here is the performance of your object detection alone (i.e. bounding box regression + classification), I would suggest you to switch from Faster-RCNN to another detector which will be more performant for few-shot settings, which seems to be your case. Faster-RCNN is a pretty old and bad detector at this point, especially for few-shot, using a more recent detector pretrained on a larger dataset such as Swin transformer, DETR, ViT etc will be better. Then you can train a SGGen model by freezing the weights of your object detector and replacing the RPN layers, as I explained it in here. Do not replace the full backbone layers or you will have to change the features extractor as well, which is more complex.

How to freeze the weight of your object detector and how to implement the code

narchitect commented 6 months ago

Do not replace the full back

Thank you so much for your reply! I was also planning to use another object detection model. hope I get better results soon thanks again!

Maelic commented 6 months ago

Hi @narchitect , The mAP you are referring to here is the performance of your object detection alone (i.e. bounding box regression + classification), I would suggest you to switch from Faster-RCNN to another detector which will be more performant for few-shot settings, which seems to be your case. Faster-RCNN is a pretty old and bad detector at this point, especially for few-shot, using a more recent detector pretrained on a larger dataset such as Swin transformer, DETR, ViT etc will be better. Then you can train a SGGen model by freezing the weights of your object detector and replacing the RPN layers, as I explained it in here. Do not replace the full backbone layers or you will have to change the features extractor as well, which is more complex.

How to freeze the weight of your object detector and how to implement the code

How to freeze the weights depends on your detector, however here you can do something simpler by forcing your detector to be in eval mode with something like model.rpn.eval() or model.backbone.eval() somewhere before your training loop, to ensure no gradients are computed.