Closed RahulSajnani closed 5 months ago
Hi! Cool idea to experiment with this encoder!
If the latent dimension is 4, you can have two blocks of 2x2 rotation matrices (potentially with different frequencies) as the steerer. So the steerer would be something like inv(Q)blockdiag(R(j a), R(k a))Q where a is the angle, R(x) is a 2x2 rotation matrix for angle x, Q is an arbitrary 4x4 matrix and j and k are integers. However, the easiest way to find a steerer for a pre-trained network would be to take the steerer to be matrix_exp(a M) where a is the angle and M is a 4x4 unknown matrix, i.e. which you have to optimize (this is what we did with DeDoDe, but with M being 256x256). You can then later decompose M into frequencies/eigenvalues if needed.
I think it would make sense to start with 90 deg rotations only (or perhaps horizontal flipping) and try to find a 4x4 matrix S such that when you multiply the latents by S (and rotate their positions since the latent has spatial dimensions), this corresponds to rotating the input by 90 degrees. This can be done by taking a set of images I, and minimizing || f(rot(I)) - S rot(f(I)) ||, where f is the frozen encoder and "rot" rotates spatially by 90 degrees. The minimizing S can be found by SGD or even explicitly with a linear least squares solver if you use few images. (For arbitrary rotations, the idea would be to minimize || f(rot(I, a)) - matrix_exp(a M) rot(f(I), a)) ||, where the angle a is sampled during SGD and rot(I, a) rotates spatially by angle a.)
I will add that since the latent dimension is so small, it could be that the best S is basically the identity matrix. To get some intuition about how the latent changes under rotations it would be interesting to plot the latents for a couple of images and their rotations. Feel free to post them here as well if you make such plots!
Hi @georg-bn
Thank you for helping me out. I will close this issue for the time being and will reopen it later after I am done testing.
Hi @georg-bn
Thank you for your work! I am curious about how you selected the number of eigen values in the SO(2) experiment for DeDoDe. Also, I was trying to learn steerers for the diffusion vae encoder (https://huggingface.co/stabilityai/sd-vae-ft-mse-original) where the feature dimension is 4. How do I decide the number of steerer eigen values (j_d in your paper eqn. 7) for this?
Furthermore, how to determine which type of model features exhibit the piece-wise rotation equivariance characteristic over which learning steerers will make the output equivariant? In your paper, you have shown this for SuperPoint and DeDoDe only and I was curious about how I can experiment your work on other models apart from theses.
Thank you!