Open psteinb opened 3 years ago
For the rest of us, these are the differences: https://github.com/facebookresearch/dino/commit/afc323572b372e72bcb574c549013689f7b6a6b3 (and similarly https://github.com/facebookresearch/dino/commit/0a3b1b823a4a2f397e8bcc87ffdd6687df634f61)
Hi @psteinb, do feel free to send a PR !
Hi! Thanks for sharing your code. In order to run fine tuning using a pretrained model (as mentioned in https://github.com/facebookresearch/dino/issues/80), do you think it makes sense to change these values as well?
@amandalucasp not sure what you mean. Do you want to use a pretrained model on task A
but using DINO on task B
?
I am trying to use one of the provided pretrained models and fine tune it on a custom dataset. My question is regarding whether the modifications you pointed make sense for fine tuning (considering the pretrained model I'm using was initially trained on image-net); or if I should think about changing these values only if training from scratch.
Ok, makes sense. In that case, my PR #63 would definitely help. If task A
was classification on imagenet, it used the standard 'zscore' normalisation per color channel. but if you'd like to run DINO for task B
, i.e. also on a different dataset, then a normalisation fit to this task B
dataset is essential to use. Otherwise, the incoming images might be treated in an illposed fashion simply because they are not normalized correctly.
I really feel like normalization maybe the bottleneck to improve the performance I'm having. Thank you for the feedback!
Hi @psteinb. Thanks for sharing nice tips. it works well. But one problem which i am facing after whole training is evaluation. if I tried to use checkpoints to run visualize_attention.py and video_generation.py, it showed the following errors.
RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 197, 384]) from checkpoint, the shape in current model is torch.Size([1, 785, 384]). size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([384, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([384, 3, 8, 8]).
any suggestion is appreciated!!! thank you~
Hi @Harry-KIT
Reading your error message it seems that your are trying to load a vit-small/16
into a vit-small/8
. Can you try adding the flag --patch_size 16
when running visualize_attention.py
and video_generation.py
?
Hi @mathildecaron31 Thank you! done
Has it solved your issue @Harry-KIT ?
Hi @mathildecaron31. Yes i did. You were right. I changed patch_size from 8 to 16. And it works!
Hi, thanks to the authors of this paper and this code for making the effort to share their work with the community.
I am trying to use Dino on a non-imagenet dataset and started to alter the code in this fashion. For details, see
main_dino.py
andvisualize_attention.py
in my fork. I am basically trying to get rid of any hard coded magic numbers related to imagenet (if possible).Drop me a :+1: if you like or need this work. If the feedback is inline with #1, I can send a PR if time permits. Other feedback on this is always welcome - feel free to send PRs to my fork.