IDEA-Research / GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
https://arxiv.org/abs/2303.05499
Apache License 2.0
5.81k stars 613 forks source link

【Feature】MMDetection supports Grounding-DINO inference and fine-tuning #228

Open hhaAndroid opened 10 months ago

hhaAndroid commented 10 months ago

Hi All: MMDetection supports Grounding-DINO inference and fine-tuning for now. The mAP we achieved in our reproduction is higher than the official results. We also provide the results of retraining the R50 model from scratch, which exhibits significantly higher performance than the official implementation.

Installation

cd $MMDETROOT

# source installation
pip install -r requirements/multimodal.txt

# or mim installation
mim install mmdet[multimodal]

NOTE

Grounding DINO utilizes BERT as the language model, which requires access to https://huggingface.co/. If you encounter connection errors due to network access, you can download the required files on a computer with internet access and save them locally. Finally, modify the lang_model_name field in the config to the local path. Please refer to the following code:

from transformers import BertConfig, BertModel
from transformers import AutoTokenizer

config = BertConfig.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased", add_pooling_layer=False, config=config)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

config.save_pretrained("your path/bert-base-uncased")
model.save_pretrained("your path/bert-base-uncased")
tokenizer.save_pretrained("your path/bert-base-uncased")

Inference

cd $MMDETROOT

wget https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth

python demo/image_demo.py \
    demo/demo.jpg \
    configs/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py \
    --weights groundingdino_swint_ogc_mmdet-822d7e9d.pth \
    --texts 'bench . car .'

Results and Models

Model Backbone Style COCO mAP Official COCO mAP Pre-Train Data
Grounding DINO-T Swin-T Zero-shot 48.5 48.4 O365,GoldG,Cap4M
Grounding DINO-T Swin-T Finetune 58.1(+0.9) 57.2 O365,GoldG,Cap4M
Grounding DINO-B Swin-B Zero-shot 56.9 56.7 COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO
Grounding DINO-B Swin-B Finetune 59.7 COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO
Grounding DINO-R50 R50 Scratch 48.9(+0.8) 48.1

Details for https://github.com/open-mmlab/mmdetection/blob/dev-3.x/configs/grounding_dino/README.md

And we also support GLIP inference and fine-tuning

If you encounter any issues while using it, please feel free to create an issue.

PawaritL commented 9 months ago

@hhaAndroid thank you very much for supporting Grounding DINO finetuning! I just have a few questions:

my goal is to maintain Grounding DINO's versatility in open-set detection but just try to add a few custom classes

  1. in this finetuning procedure from the MMDetection docs, it looks like we have to explicitly set the number of classes. does this mean the finetuned model can no longer do open-set detection? or am I misunderstanding something?
  2. will the finetuned model still be able to handle Referring Expression Comprehension (REC)? for example, can I still prompt the finetuned model with "the left lion"?
  3. could you please share any script or code snippets on how you achieved the finetuning?

Many thanks!

FengheTan9 commented 9 months ago

@hhaAndroid thank you very much for supporting Grounding DINO finetuning! I just have a few questions:

my goal is to maintain Grounding DINO's versatility in open-set detection but just try to add a few custom classes

  1. in this finetuning procedure from the MMDetection docs, it looks like we have to explicitly set the number of classes. does this mean the finetuned model can no longer do open-set detection? or am I misunderstanding something?
  2. will the finetuned model still be able to handle Referring Expression Comprehension (REC)? for example, can I still prompt the finetuned model with "the left lion"?
  3. could you please share any script or code snippets on how you achieved the finetuning?

Many thanks!

Maybe the text input of GroundingDINO in mmdet fixed categoly (not real text) 😥

Liquidmasl commented 8 months ago

If you encounter any issues while using it, please feel free to create an issue.

This is amazing, thank you!

Can those models be used with the base groundingdino implementation? the configs look quite different, so i guess not? Bummer to change the implementation at this point

25icecreamflavors commented 7 months ago

Can I finetune grounding dino on a prompt? The thing is that there should be these objects in pretraining data, but I would like to add some additional information to get better predictions. Let's say I only want to detect "black cats". The problem is that I have few data samples, so I would like to tune it a little bit with prompt to use pretrained knowledge.

SoulProficiency commented 5 months ago

hi,What are the minimum equipment requirements of fine-tunning grounddino with coco dataset?(default batch-size=32)