Parskatt / RoMa

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
https://parskatt.github.io/RoMa/
MIT License
556 stars 43 forks source link

small model backbone #28

Open gc625-kodifly opened 5 months ago

gc625-kodifly commented 5 months ago

Hi, I've modified the code to load a DinoV2 small model, but i realized that the embed_dim of the vit_small model

def vit_small(patch_size=16, **kwargs):
    model = DinoVisionTransformer(
        patch_size=patch_size,
        embed_dim=384,
        depth=12,
        num_heads=6,
        mlp_ratio=4,
        block_fn=partial(Block, attn_class=MemEffAttention),
        **kwargs,
    )
    return model

is 384, which is causing dimension mismatch problem with the RoMa ckpt provided. which assumes that the embed_dim is 1024. e.g.

 proj16 = nn.Sequential(nn.Conv2d(1024, 512, 1, 1), nn.BatchNorm2d(512))

can you provide the weights for RoMa-s? As the model takes ~6GB vram even after applying the change from kde to approx_kde in #23, so being able to use a small model would help a lot

Parskatt commented 5 months ago

Sorry don't have it. I'll try training one (also with vit base).

gc625-kodifly commented 5 months ago

please do :) that would be really helpful. Looking forward to the results!

Dawars commented 5 months ago

@Parskatt Could you also train one using FeatUp? Would it be possible to get higher resolution outputs?

https://github.com/mhamilton723/FeatUp

Parskatt commented 5 months ago

@Dawars I'm not convinced that FeatUp is useful. Would be glad to be proven wrong, but not something I'll spend time on currently.