Can DINO be trained on a real-world dataset where the object of interest is not centered in the image. For example, consider an image of a desk with a laptop, notepad, lamp, books and a pen. Employing multi-crop strategy here would result in crops of objects that do not share any features. Can such a training signal help the model learn?
Can DINO be trained on a real-world dataset where the object of interest is not centered in the image. For example, consider an image of a desk with a laptop, notepad, lamp, books and a pen. Employing multi-crop strategy here would result in crops of objects that do not share any features. Can such a training signal help the model learn?