facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.3k stars 699 forks source link

Regarding the training data composition #405

Closed amundra15 closed 1 month ago

amundra15 commented 2 months ago

It is unclear if the model was trained on object-centric images (eg. ImageNet) or scene-level images. In Sec. 3, the authors mention retrieving the dataset by crawling the web. Does this mean that the dataset contains scenes composed of multiple objects?

Perhaps you could release a small sample dataset to give a general idea.

qasfb commented 1 month ago

I expect the dataset should contain some images with multiple objects, as we haven't filtered those out in any way. Releasing a sample dataset is a complex process and therefore I would recommend not planning around that.