Question to the paper and implementations

MrNeRF commented 6 months ago

Hey, I am reading your great work and do have some questions with respect to code and paper:

So I guess mu prime is in pixel space, i.e the projected gaussian center in pixel coordinates. But what is then mu hat in (6)? Should this be camera coordinates? I think this is nowhere specified. In code it is implemented here and it is in camera space if I read it correctly?

Later, in A.2 you are highlighting: "Note that unlike the original gsplat [46] and the Inria implementations, we do not use the OpenGL NDC coordinate system as an intermediate step between projecting Gaussians to pixel coordinates." But what is the advantage here beside saving very few operations?

Thanks for the clarification in advance. Janusch

oseiskar commented 6 months ago

Hello. Thanks for pointing this out. $\hat \mu$ is indeed in camera coordinates and only defined in the Appendix (A.2). The paper uses the notation: hat = camera coordinates (3D), prime = pixel coordinates (2D), nothing = world coordinates. Tilde does not have a consistent meaning.

If you read A.2, it's also good to know that the formula for $J_i$ is incorrect, which will be fixed in the next version of the manuscript. Here's a corrected version (EDIT: changed! the "version 2" was not correct either)

The error does not appear in the code. However, the pixel coordinate formula in the code is also actaully an approximation that skips some small terms in the gradient. This will also be clarified/changed in the next revision.

Later, in A.2 you are highlighting: "Note that unlike the original gsplat [46] and the Inria implementations, we do not use the OpenGL NDC coordinate system as an intermediate step between projecting Gaussians to pixel coordinates." But what is the advantage here beside saving very few operations?

The NDC coordinate system has always been an superfluous/legacy concept in 3DGS, and probably a leftover from early experiments at Inria, but sometimes/often mistaken to be a integral part of the method. I have elaborated this in this PR https://github.com/nerfstudio-project/gsplat/pull/97#issuecomment-1951381539 which removes the NDC coordinates (and otherwise streamlines the implementation) in gsplat. It has already been merged to gsplat & Nerfstudio so the "unlike the original gsplat" comment does not apply to the latest gsplat version anymore.

MrNeRF commented 6 months ago

Thank you for you explanation.

Take equ 6: v_prime = -J_i(...). Let's call the result of inside total_vel as in code. If J_i is given by J_prime J_tilde it should equal : [[f_x, 0, -mu_x/d], [0, f_y, -mu_y/d]} total_vel is elem R^3. So you set the z coordinate to 0 and multiply with J_i? In that case I get float2 out = { -total_vel_npc.x focal_lengths.x, -total_vel_npc.y * focal_lengths.y }; as in the code where it contains the 1/d from J_prime in (15)

However, it is still unclear. I think the multiplication of J_prime J_tilde is not the Jacobian as in the ewa splatting https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L89 for the last column. Is this what you meant by

However, the pixel coordinate formula in the code is also actaully an approximation that skips some small terms in the gradient. This will also be clarified/changed in the next revision.

Sorry for bothering you. This is one of the few contributions that give a great boost in rendering quality but the paper is a little bit hard to digest. I really would like to understand it.

Furthermore, there is no ablation with respect to the gamma correction? Does it have any effect?

Although, I was asking myself what could be the reasons that for the iphone only the deblurring has a significant effect. Whereas the pose optimization improves the quality from android recordings. What could be the effect? In theory you proprietary VIO software should give similar results. Is this a sensor (IMU) issue or is there a regression in the VIO software on iphone?

oseiskar commented 6 months ago

Hi. Thanks for checking this in detail. The "version 2" with "J tilde" was actually not correct either. Edited the previous message to show the re-corrected version: $J = J' K_i$.

So it's (supposed to be) essentially the same matrix as in the Inria code you linked, that is, the Jacobian $J$ of the pinhole camera projection: $(x, y, z) \mapsto (f_x x/z + c_x, f_y y / z + c_y)$, which is

J = [ fx/z    0         -fx * x / z^2 ]
    [ 0       fy/z      -fy * y / z^2 ]

The last row of in the Inria code is all zeros and disappears here. I think they just wanted to use glm::mat3 for everything instead of a collection of 2x3, 3x3 and 3x2 matrices to simplify the code.

This is not exactly the same as the pixel velocity formula in our code, which is missing the last column of the $J$ matrix (corresponding to the radial "zoom" factor of motion, which is often small). This will be tuned in the next revision.

The requirement for additional pose optimization on Android is linked to the rolling shutter effect, which cannot be correctly optimized in COLMAP, at least in the mode that is currently used in Nerfstudio.

MrNeRF commented 5 months ago

Hi, I am now trying to replicate this method with the INRIA code. I think the INRIA code runs directly on colmap coordinates ( camera system: Y-down Z-forward.). My question is in what coordinates is the output you are feeding into nerfstudio? You are also doing here some transformations: https://github.com/SpectacularAI/nerfstudio/blob/ba89a8db1e0afc2d203d2d37b98e9faac7247aab/nerfstudio/models/splatfacto.py#L704-L711

The camera_angular_velocity and camera_linear_velocity are all in camera coordinates already and don't need to be transformed? I did not see any transformation to nerfstudio coordinates prior to splatfacto.py btw.

Update: Is it enough to directly parse the sai-cli folder colmap output? This is used by the combine.py. Why do I need the alignment to the colmap poses? Do I need to run combine.py at all? Otherwise, if I need to run it - and if I don't miss anything - the velocity scaling should be enough, right?

MrNeRF commented 5 months ago

I got it implemented here: https://github.com/MrNeRF/gs-on-the-move Currently, I am bit puzzled where the error is. If the data is an issue (e.g. coordinate system) or if the rasterizer impl is wrong. I check the rasterizer a couple of times and think it should be good....

oseiskar commented 5 months ago

Hi. Unfortunately, I don't have time to debug this in the near future. One possible source of error is that Inria code is using the hacky OpenGL coordinates and redundant "projmat", which were reomved from gsplat in https://github.com/nerfstudio-project/gsplat/pull/9 and this would cause some hidden/unintuitive coordinate system change compared to this code.

The data produced by sai-cli is in Nerfstudio's coordinate convention (conversion from SAI/OpenCV coordinate convention here https://github.com/SpectacularAI/sdk/blob/ab85990b924db72b3d6aef21826618d53c21ab26/python/cli/process/process.py#L152-L209)

The idea is indeed that camera_angular_velocity and camera_linear_velocity are already in the camera coordinate system and do not depend on the poses. You can indeed directly parse the sai-cli folder output and it is not required to use COLMAP. However, then the results, especially the intrinsic calibration may not be as accurate in current versions of sai-cli. COLMAP alignment in combine.py, if you also use COLMAP, is needed only for linear velocity scaling.

We will probably publish an updated version of the code with fixed formulas for the pixel velocities in a couple of weeks.

MrNeRF commented 5 months ago

Thx for the pointers. I don't expect you to debug it and know that you have your own stuff to do :). As you originally implemented it, you might have come along similar issues. That's why I am asking if that kinda sounds familiar to you. I will try what you suggested. Maybe it already resolves the bug(s) I have.

MrNeRF commented 5 months ago

Btw, I got it working. There were basically two problems. All are due to data processing.

the velocities are in nerfstudio coordinates. y and z need to be flipped.
Nerfstudio undistorts the images upon data loading as it is not part of the preprocessing step. I didn't do this before training which had a negative impact on the training/rendering.

Thx for the great repo!

oseiskar commented 4 months ago

FYI @MrNeRF. The code and arXiv paper have now now been updated. There is also another major improvement: IMU data is now optional. See Changelog https://github.com/SpectacularAI/3dgs-deblur?tab=readme-ov-file#version-2-2024-05

MrNeRF commented 4 months ago

@oseiskar Exciting. Will check it out. I saw also the paper is updated! Great work!

SpectacularAI / 3dgs-deblur

Question to the paper and implementations #5