IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.24k stars 250 forks source link

What is the image input size, 640x640 for COCO set? #87

Open AI-Passionner opened 2 years ago

AI-Passionner commented 2 years ago

I read the paper and checked the code base. It seems it didn't specify the input size in the training except for in the data augmentation (COCO_transformer.py). So I am wondering what resolution the AP score (49.4) got from.

I am training my own dataset and like to use the resolution 1024x1024 or close. Any suggestion? For example, I might try the fixed size first and then try the data augmentation.

HaoZhang534 commented 2 years ago

@AI-Passionner All the AP's shown in the README use the default setting: scales = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800], max_size = 1333.

AI-Passionner commented 2 years ago

@SuperHenry2333 Thank you much.

Is this something different from classical object detection, which usually needs to specify the input size, like 640x640? In the DETR-like models, we just need to specify different resolutions and then convert them to a sequence of sub-images.

In order to train my own set with high resolutions, do I just need to modify the scales? For examples, """ data_aug_scales = [800, 960, 1024] data_aug_max_size = 1024 data_aug_scales2_resize = [800, 960, 1024] data_aug_scales2_crop = [800, 1024] """

Best,

HaoZhang534 commented 2 years ago

We follow the previous DETR-like method to do data augmentation. We do not fix the image size when training. We randomly select a size from scales = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800].