IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.15k stars 232 forks source link

Training on a larger dataset #121

Open mactavish91 opened 1 year ago

mactavish91 commented 1 year ago

Hello, the author, great job! I found that Dino performs very well on the coco dataset. But if I want to transfer the model to a dataset with thousands or tens of thousands of categories, such as the lvis dataset, do I need to adjust the model structure? I don't think 256 hidden size and 6 transformer layers are enough for such a dataset.

SlongLiu commented 1 year ago

I have not tested DINO on LVIS, so it is hard for me to provide precise suggestions. We will be grateful if you provide an LVIS result. For the parameters, a larger hidden size may work better under the same conditions. Yet we found more decoder layers brought nearly no improvements on COCO. If you want to get a number of DINO performances, I think the standard config is a good baseline, which works well most times. You only need to adjust some parameters for your custom dataset following the custom dataset guidance.