facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.3k stars 699 forks source link

Input image size mismatch problem encountered using DINO as a feature extractor #401

Closed u1nderdog closed 1 month ago

u1nderdog commented 3 months ago
Here is my code using DINO as a feature extractor
`  dinov2_weights = torch.hub.load_state_dict_from_url("https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth", map_location="cpu")
    vit_kwargs = dict(img_size=518,
        patch_size= 14,
        init_values = 1.0,
        ffn_layer = "mlp",
        block_chunks = 0,
    )

    dinov2_vitl14 = vit_large(**vit_kwargs).eval()
    dinov2_vitl14.load_state_dict(dinov2_weights)`

The error message is displayed: assert H % patch_H == 0, f"Input image height {H} is not a multiple of patch height {patch_H}" AssertionError: Input image height 320 is not a multiple of patch height 14

I entered the size of the image as [320,736]. When I set the patch_size to 16, the loaded checkpoint does not match. My aim is to perform feature extraction on image of size [320,736]. How should I load the pre-training weights? Is there any other way to solve my problem?

vcadillog commented 3 months ago

DinoV2 has been trained with a patch size of 14, is not possible to load a model with a different patch size, unless you train it by scratch, for that reason your image size should also be a multiple of 14.

u1nderdog commented 3 months ago

DinoV2 has been trained with a patch size of 14, is not possible to load a model with a different patch size, unless you train it by scratch, for that reason your image size should also be a multiple of 14.

Thanks a lot