Parskatt / RoMa

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
https://parskatt.github.io/RoMa/
MIT License
630 stars 51 forks source link

torch AMP for DINOv2 precision issue #71

Closed lnexenl closed 3 months ago

lnexenl commented 3 months ago

torch.amp is commonly used in the model and it obviously decreases the VRAM usage and increases speed. The DINOv2 weight is frozen in RoMa, and the output of float16 version DINOv2 might be slightly different from float32 version. I am wondering that have you ever tried to train DINOv2 together?

Parskatt commented 3 months ago

I have not. Probably lora would be the easiest way to go. Roma is already quite expensive to train so didnt look into this. You could also try just unfreezing soke layers near the end.

lnexenl commented 3 months ago

I also think it costs lots of GPUs to train, but I doesn't have cards with more than 32GB VRAM, so I decide to keep it original. 😆