Why use 5 different resolutions in the same convolutional refiner

Parskatt / RoMa

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

https://parskatt.github.io/RoMa/

MIT License

630 stars 51 forks source link

Why use 5 different resolutions in the same convolutional refiner #53

Open skill-diver opened 4 months ago

skill-diver commented 4 months ago

Hi, Author,

Thank you for the sharing. I am confused about why you use 5 different resolutions and the same convolutional network. And why you choose this convolutional network architecture?

Parskatt commented 4 months ago

Hi, could you be a bit more precise? The refiners use a coarse to fine approach which is common in matching tasks.

skill-diver commented 4 months ago

Yes, what I mean is why you choose 16 8 4 2 1 as the scale repeatedly in a same conv_refiner, why not just choose a network which could accept all resolutions input?

Parskatt commented 4 months ago

Typically you get worse performance that way, you can use more channels at lower resolution. If you use a single network that's difficult.

skill-diver commented 4 months ago

Thank you. So, you need a different input channels setting for different resolutions. Do you think is there a powerful network which could use just one parameter setting to do the good work like the different scales settings mulitple convolutional refiner now?

Parskatt commented 4 months ago

Not impossible, but I'm not sure what the benefit would be.