Vision transformer backbone

bioptimus / releases

71 stars 7 forks source link

Vision transformer backbone #3

Open rbareja25 opened 1 month ago

rbareja25 commented 1 month ago

Hello,

Could you provide the vision transformer backbone used for the model? I am using dino's vision_transformer.py code for a vit -giant (https://github.com/facebookresearch/dino/blob/main/vision_transformer.py). def vit_giant(patch_size=16, kwargs): model = VisionTransformer( patch_size=patch_size, embed_dim=1536, depth=40, num_heads=24, mlp_ratio=5.33334, qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), kwargs) return model

But I am getting size mismatch error.

Thanks, Rohan

checkpt0 commented 1 month ago

Hello @rbareja25,

The checkpoint provided is not directly compatible with the function dinov2.models.vision_transformer.vit_giant2 of the open source dinov2 repository.

You can follow the steps detailed in the Hugging Face model page to load the model.

If you want to load the checkpoint with the aforementioned function, you will probably have to tweak the state dict keys.

I hope this helps, Best, Charlie