facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.06k stars 885 forks source link

Training on uncurated/unbound datasets #231

Open ajkailash opened 1 year ago

ajkailash commented 1 year ago

Can DINO be trained on a real-world dataset where the object of interest is not centered in the image. For example, consider an image of a desk with a laptop, notepad, lamp, books and a pen. Employing multi-crop strategy here would result in crops of objects that do not share any features. Can such a training signal help the model learn?