Unstable training with dense scenes

I was training a Dino model on densely populated scenes with many small targets very close together. The model did not appear to be learning. Even when using the same images in the trian/test/val sets I did not achieve above 0.05 mAP50. If I use scenes with the same objects in them that are less dense I am able to train properly.

Could this be caused by the matching algorithm becoming unstable when many small objects are present in a scene? If so would this be solved by the introduction of stable-dino once you all upload that repository?

IDEA-Research / DINO

Unstable training with dense scenes #213