georg-bn / rotation-steerers

A steerer for D-dimensional keypoint descriptions is a DxD matrix that transforms the descriptions as if they were computed from a rotated image.
MIT License
60 stars 6 forks source link

How to Determine models that exhibit this piecewise rotation equivariance property? and How to decide the eigen values for learning SO(2) steerers #2

Closed RahulSajnani closed 3 months ago

RahulSajnani commented 3 months ago

Hi @georg-bn

Thank you for your work! I am curious about how you selected the number of eigen values in the SO(2) experiment for DeDoDe. Also, I was trying to learn steerers for the diffusion vae encoder (https://huggingface.co/stabilityai/sd-vae-ft-mse-original) where the feature dimension is 4. How do I decide the number of steerer eigen values (j_d in your paper eqn. 7) for this?

Furthermore, how to determine which type of model features exhibit the piece-wise rotation equivariance characteristic over which learning steerers will make the output equivariant? In your paper, you have shown this for SuperPoint and DeDoDe only and I was curious about how I can experiment your work on other models apart from theses.

Thank you!

georg-bn commented 3 months ago

Hi! Cool idea to experiment with this encoder!

If the latent dimension is 4, you can have two blocks of 2x2 rotation matrices (potentially with different frequencies) as the steerer. So the steerer would be something like inv(Q)blockdiag(R(j a), R(k a))Q where a is the angle, R(x) is a 2x2 rotation matrix for angle x, Q is an arbitrary 4x4 matrix and j and k are integers. However, the easiest way to find a steerer for a pre-trained network would be to take the steerer to be matrix_exp(a M) where a is the angle and M is a 4x4 unknown matrix, i.e. which you have to optimize (this is what we did with DeDoDe, but with M being 256x256). You can then later decompose M into frequencies/eigenvalues if needed.

I think it would make sense to start with 90 deg rotations only (or perhaps horizontal flipping) and try to find a 4x4 matrix S such that when you multiply the latents by S (and rotate their positions since the latent has spatial dimensions), this corresponds to rotating the input by 90 degrees. This can be done by taking a set of images I, and minimizing || f(rot(I)) - S rot(f(I)) ||, where f is the frozen encoder and "rot" rotates spatially by 90 degrees. The minimizing S can be found by SGD or even explicitly with a linear least squares solver if you use few images. (For arbitrary rotations, the idea would be to minimize || f(rot(I, a)) - matrix_exp(a M) rot(f(I), a)) ||, where the angle a is sampled during SGD and rot(I, a) rotates spatially by angle a.)

I will add that since the latent dimension is so small, it could be that the best S is basically the identity matrix. To get some intuition about how the latent changes under rotations it would be interesting to plot the latents for a couple of images and their rotations. Feel free to post them here as well if you make such plots!

RahulSajnani commented 3 months ago

Hi @georg-bn

Thank you for helping me out. I will close this issue for the time being and will reopen it later after I am done testing.