facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.3k stars 699 forks source link

Regarding the emergence property #406

Closed amundra15 closed 1 month ago

amundra15 commented 2 months ago

I want some understanding/ intuition on the emergence behaviour of the model.

As I understand it, the loss simply ensures that the output of the different transformations of the input image to remain close to each other. This can surely make the model learn foreground-background separation (as shown in DINOv1). However, DINOv2 exhibits emergence behaviour where it can also learn the semantic meaning for the parts of the objects -- for eg, in Fig 1, the visualization shows the same colour gradient for the wings of various birds and planes.

What leads to the emergence of this behaviour, and how does the loss encourage it?

qasfb commented 1 month ago

No idea; one can see this question as a topic of future research...