facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
9.12k stars 812 forks source link

[request] LVD-142M pretraining dataset and / or data curation code #56

Open patricklabatut opened 1 year ago

patricklabatut commented 1 year ago

Related issues:

charlescurt commented 1 year ago

Could you share perhaps not the actual curation code/data but the rough ideas and theory behind it? We are trying to implement this code base on an entirely different dataset, and I would like to know what the main goals behind the data preparation were.

What happened when you skipped these steps entirely or did only parts?

For example, how would you expect this self-supervised method to perform in a dataset where images are all very similar and only subtly different?