Detection data. Following GLIP, we reformulate the object detection task to a phrase grounding task by concatenating the category names into text prompts. We use COCO, O365, and OpenImage(OI) for our model pretrain. To simulate different text inputs, we randomly sampled category names from all categories in a dataset on the fly during training.
Grounding data. We use the GoldG and RefC data as grounding data. These data can be fed into Grounding DINO directly.
Q1: How to sample detection dataset?
One dataloader to handle all datasets,combining the categories of these datasets and randomly sampling them uniformly
Each dataloader to handle their own dataset, only sample categories in the specific dataset(e.g. O365,OI).
Q2: There are no the lable list in grounding dataset, how to sample grounding dataset?
Hi, Thanks for your greate work!
Q1: How to sample detection dataset?
Q2: There are no the lable list in grounding dataset, how to sample grounding dataset?