facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.47k stars 717 forks source link

How to use Dinov2 for self-supervised training? #248

Open zhanglaoban-kk opened 9 months ago

zhanglaoban-kk commented 9 months ago

Hello, I would like to ask, how to use Dinov2 for self-supervised learning of unlabeled images, get pre-trained weights, and then load the obtained pre-trained weights for supervised learning with labeled data (for image classification)?

steve-zeyu-zhang commented 9 months ago

Hi team,

Thanks @zhanglaoban-kk brought this up, I have the same problem.

Is that like what #142 @TheoMoutakanni mentioned that the label in training/eval/testing are all just for evaluation, and all we need just set the label as something like '0' or 'np.nan', then mocking the ImageNet dataset?

MODEL: WEIGHTS: 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth'

However, it reports error ... File "/hpcfs/users/a1781032/default_studio/dinov2/dinov2/train/train.py", line 153, in do_train start_iter = checkpointer.resume_or_load(cfg.MODEL.WEIGHTS, resume=resume).get("iteration", -1) + 1 ... checkpoint_state_dict = checkpoint.pop("model") KeyError: 'model'

which the dinov2_vitb14_pretrain.pth does not have key model but the architecture keys: 'cls_token', 'pos_embed', 'mask_token', 'patch_embed.proj.weight', 'patch_embed.proj.bias', 'blocks.0.norm1.weight', 'blocks.0.norm1.bias', 'blocks.0.attn.qkv.weight', 'blocks.0.attn.qkv.bias', 'blocks.0.attn.proj.weight', 'blocks.0.attn.proj.bias' ...

How can we do to load the provided pre-train checkpoint into the model before training?

Best, Steve

steve-zeyu-zhang commented 9 months ago

Hi team,

I had some new findings. After I looked through #154 @surajyakoa explained that he/she wanted to fine-tune DINOv2 with some unlabeled data and @qasfb has explained that it requires the full checkpoint not just the backbone, and since they hadn't release them so it's not practical.

Is that correct?

Steve

NikitaRA commented 8 months ago

Hi team,

I had some new findings. After I looked through #154 @surajyakoa explained that he/she wanted to fine-tune DINOv2 with some unlabeled data and @qasfb has explained that it requires the full checkpoint not just the backbone, and since they hadn't release them so it's not practical.

Is that correct?

Steve

Sad moment, so it is not posible to fine-tune current model from hub?

kusstox commented 8 months ago

Any news here?

MarioAvolio commented 8 months ago

Any news here?

Nope for now, someone that know how to make self-distillation using the train file in the repo ?

TimDarcet commented 8 months ago

Hi,

My advice would be to not load the weights using the checkpointer resume function, as it also expects the optimizer buffer etc. However you can launch a normal training from scratch, and load these pretrained weights before the first iteration with torch.load and model.load_state_dict. Make sure you both load the weights to the student and the teacher.

You may need to slightly rename the checkpoint keys: blocks.0.attn.qkv.weight may be named something like module.backbone.blocks.0.0.attn.qkv.weight or whatever. I don't remember the exact mapping, but it should not be hard to find if you compare the keys on both sides

You might also need to use strict=False, because the checkpoint does not include the head weights. However, make sure you print out the load message, so that you know if the loading failed or not. That is a common source of bugs.

The only real question is: how well will the training work with a pretrained backbone and a random init head. I think it may be beneficial to freeze the backbone and train only the heads for a few thousand iterations, so it has a bit of time to fit. Then unfreeze the whole and start the real training.

Best of luck,

Tim

risingClouds commented 5 months ago

Hi team,

Thanks @zhanglaoban-kk brought this up, I have the same problem.

  • For using unlabeled images

Is that like what #142 @TheoMoutakanni mentioned that the label in training/eval/testing are all just for evaluation, and all we need just set the label as something like '0' or 'np.nan', then mocking the ImageNet dataset?

MODEL: WEIGHTS: 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth'

However, it reports error ... File "/hpcfs/users/a1781032/default_studio/dinov2/dinov2/train/train.py", line 153, in do_train start_iter = checkpointer.resume_or_load(cfg.MODEL.WEIGHTS, resume=resume).get("iteration", -1) + 1 ... checkpoint_state_dict = checkpoint.pop("model") KeyError: 'model'

which the dinov2_vitb14_pretrain.pth does not have key model but the architecture keys: 'cls_token', 'pos_embed', 'mask_token', 'patch_embed.proj.weight', 'patch_embed.proj.bias', 'blocks.0.norm1.weight', 'blocks.0.norm1.bias', 'blocks.0.attn.qkv.weight', 'blocks.0.attn.qkv.bias', 'blocks.0.attn.proj.weight', 'blocks.0.attn.proj.bias' ...

How can we do to load the provided pre-train checkpoint into the model before training?

Best, Steve

Hi Steve, I have the same require as yours. I also want to train dinov2 using unlabeled images. Does it work on [imagenet] mocked dataset ? Could you offer some detail to me?

Best, Luo