Asparagus15 / GaussianShader

code for GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Other
288 stars 23 forks source link

`get_minimum_axis` seems wrong? #19

Open initialneil opened 4 months ago

initialneil commented 4 months ago

image

I've got an example here:

  1. The sorting selects 2 (the last column)
  2. gather moves the last column to the first column
  3. BUT R_sorted[:,0,:] selects the first row

So R_sorted[:,0,:] should be changed to R_sorted[:,:,0] ?

Why sorting the columns of R instead of rows?


After playing with the math, I believe that gather should work on rows instead of columns.

R_sorted = torch.gather(R, dim=2, index=sorted_idx[:,None,:].repeat(1, 3, 1)).squeeze()
# changes to 
R_sorted = torch.gather(R, dim=1, index=sorted_idx[:,:,None].repeat(1,1,3)).squeeze()

After the discussion below, we(@nyy618 and me) think that the normal is got by selecting the column.

R_sorted = torch.gather(R, dim=2, index=sorted_idx[:,None,:].repeat(1, 3, 1)).squeeze()
return R_sorted[:,:,0]

Pull request updated accordingly.

nyy618 commented 4 months ago

I find this part weird too. It sorts the columns yet takes the row as the normal. But I still can't understand your modification. I think you should sort the columns instead of sorting the rows. According to linear algebra, the columns are the eigenvectors, which represent the direction after the rotation transformation. So R_sorted[:,0,:] should be changed to R_sorted[:,:,0] as you first mentioned. GaussianPro also sorts the column, just like my opinion.

        rotations_mat = build_rotation(rotations)
        scales = pc.get_scaling
        min_scales = torch.argmin(scales, dim=1)
        indices = torch.arange(min_scales.shape[0])
        normal = rotations_mat[indices, :, min_scales]

Is there something wrong with my understanding?

initialneil commented 4 months ago

@nyy618 I had a case where I had to warp these R matrices by motion. And I came to the conclusion that these R matrices are w2c rotations for the gauss that transform from world coordinates to the gauss' local coordinates. And for w2c rotations the rows are the axis vectors viewed in the world coordinates.

My case is like this:

  1. I have R for each gauss.
  2. Selecting sorted row as normal
  3. Apply additional rotation R' to the model gives: R <- R * inv(R').
  4. After applied additional rotation, the normal still looks fine.

I tried selecting columns, and couldn't make it work.

nyy618 commented 4 months ago

@initialneil Thank you for your inspiring clarification. I think the key point is the difference between rotation of world coordinates and rotation of the Gaussian in the world coordinates. The inverse of R is equal to the transpose of R since it is orthogonal. R is the rotation in world coordinates. The columns of R means how to represent the axis of Gaussian ellipsoid in world coordinates. However, if you want to transform from world coordinates to the Gaussian's local coordinates, you have to apply the inverse of R, namely the transpose of R. In theory, transition matrix from world coordinates basis to Gaussian coordinates basis is R, which means how to represent basis of Gaussian coordinates with the basis of world coordinates. Let the basis of world be e and the basis of Gaussian be e': }B3PO@BU@CDUPXQP7E26TVW

If you want to represent the basis of world with basis of Gaussian, you have to apply the inverse of R. BTW, the getWorld2View2 function also takes the transpose of Camera.R as the rotation of w2c matrix.

def getWorld2View2(R, t, translate=np.array([.0, .0, .0]), scale=1.0):
    Rt = np.zeros((4, 4))
    Rt[:3, :3] = R.transpose()
    Rt[:3, 3] = t
    Rt[3, 3] = 1.0

    C2W = np.linalg.inv(Rt)
    cam_center = C2W[:3, 3]
    cam_center = (cam_center + translate) * scale
    C2W[:3, 3] = cam_center
    Rt = np.linalg.inv(C2W)
    return np.float32(Rt)

Still I am not sure with my conclusion, I will refer to others for help. Hope you can point out my misunderstanding.

initialneil commented 4 months ago

@nyy618 In the definition of camera projection, the R is w2c: P_cam = R * P_world + t So the original Camera.R should be w2c. But for the use of glm in the cuda code, the author of GS specifically stored camera's R in transposed: https://github.com/graphdeco-inria/gaussian-splatting/blob/472689c0dc70417448fb451bf529ae532d32c095/scene/dataset_readers.py#L196-L197

# get the world-to-camera transform and set R, T
w2c = np.linalg.inv(c2w)
R = np.transpose(w2c[:3,:3])  # R is stored transposed due to 'glm' in CUDA code

For the R of gauss, it seems that it's stored directly in w2c, so the axis should be rows instead of columns.

nyy618 commented 4 months ago

@initialneil Thank you for your correction. I made a wrong example. Let the problem reduced to 2D Gauss. According to the paper, the covariance of the matrix is equal to RSS(T)R(T). For a particular problem: O11WGR` 60PGRG8W`7FMLBP As you can see, the direction of the long axis is equal to the first column of the R and the direction of short axis is equal to the second. I think you should apply the transpose of R to rotate the coordinate. Is there something I missed?

initialneil commented 3 months ago

@nyy618 I finally got some time to settle this question. I did some experiments and I think your math is correct. The normal is the columns instead of rows.

  1. One GS in the eye of a camera with identity rotation matrix. image

  2. Setting one of the scaling to very small makes the gs to shrink image

  3. The shrinked edge is the last column of R if we set the last scaling to be small image

yinyunie commented 3 months ago

Hi,

I also agree with @initialneil .

When I read this line, from my understanding, R's column space is a transformation from Gaussian to world system, and the shortest axis should be the first column, like below

x_axis = R_sorted[:,0,:] # normalized by defaut
should be --->
x_axis = R_sorted[:,:,0] # normalized by defaut

I also did an experiment on the horse_blender. The PSNR doesnot change much, so I assume the normal and normal_2 take the major effect in regressing the correct normal.

[ITER 30000] Evaluating test: L1 0.016133079305291176 PSNR 26.573974609375 [21/05 19:52:34]

[ITER 30000] Evaluating train: L1 0.010304585099220276 PSNR 29.289199829101562 [21/05 19:52:39]
initialneil commented 3 months ago

@yinyunie Agree. For static scene here, the R is like a black box of parameters anyway. But it gets important when extending to dynamic scenarios. So better be fixed.