Questions about radius - Githubissues

TLB-MISS commented 1 year ago

First of all, thank you so much for providing such wonderful work.

I have two questions. First, what is sphere_init_radius in neus? I wonder what exactly this role does, and how it relates to radius. The neus implementation here only provides a square bounding box (as using radius). Why would not provide a rectangular bounding box? Are there any issues with that version of implementation?

Thanks!

bennyguo commented 1 year ago

Hi! We could use a rectangular bounding box instead of a square one :) The modification should be minor. I'll try to implement this when I have time.

sphere_init_radius corresponds to the sphere initialization widely used in SDF-based methods. We initialize the SDF to be a sphere for better convergence, and sphere_init_radius is the radius of this sphere relative to radius (i.e., setting sphere_init_radius=1.0 should have an initialized sphere in the size of radius).

TLB-MISS commented 1 year ago

@bennyguo

Thank you for your quick reply! I have an another question related with "box artifacts". Like this issue, There are box artifacts when we train a scene without any mask. Do you know any idea about removing this?

Thanks.

bennyguo commented 1 year ago

Are you training on data with monochromatic background (like NeRF-Synthetic) or on real-captured data (like MipNeRF-360)? You may refer to the "Floater problem" section here and see if it helps.

TLB-MISS commented 1 year ago

@bennyguo

I'm training on CO3D dataset which is real video frames dataset so the background is not a solid color. But thank you for your kind reply!

bennyguo commented 1 year ago

@TLB-MISS Could you try the latest code and tune model.radius to tightly bound the object?

TLB-MISS commented 1 year ago

@bennyguo

The latest version doesn't seem to detect the camera information properly. it10000-0

ADD: previous version's result it10000-1

bennyguo commented 1 year ago

Are you capturing this from forward-facing viewpoints? There are two methods that can be used the determine the scene center in the latest version: https://github.com/bennyguo/instant-nsr-pl/blob/48145751245641b5004068060dab1dbb864857ca/datasets/colmap.py#L30-L43

dataset.center_est_method=camera: regard the center of all cameras as the scene center
dataset.center_est_method=lookat: assumes the cameras are looking at an object and calculates an approximate look-at point as the scene center I believe neither method is very suitable for forward-facing captures. Maybe I should add another option that takes the center of the sparse point cloud as the scene center? Is it possible to provide the data you used for testing? Thanks!

TLB-MISS commented 1 year ago

@bennyguo

The problem above has nothing to do with center estimation methods(camera, lookat, original get_center). This is due to a new pose normalization process.

I changed the code below to the old version and the problem was solved. Latest code

    # rotation and translation
    Rc = torch.stack([x, y, z], dim=1)
    tc = center.reshape(3, 1)
    R, t = Rc.T, -Rc.T @ tc
    poses_homo = torch.cat([poses, torch.as_tensor([[[0.,0.,0.,1.]]]).expand(poses.shape[0], -1, -1)], dim=1)
    inv_trans = torch.cat([torch.cat([R, t], dim=1), torch.as_tensor([[0.,0.,0.,1.]])], dim=0)
    poses_norm = (inv_trans @ poses_homo)[:,:3] # (N_images, 4, 4)

    # scaling
    scale = poses_norm[...,3].norm(p=2, dim=-1).min()
    poses_norm[...,3] /= scale

    # apply the transformation to the point cloud
    pts = (inv_trans @ torch.cat([pts, torch.ones_like(pts[:,0:1])], dim=-1)[...,None])[:,:3,0]
    pts = pts / scale

Previous code

    # rotation
    Rc = torch.stack([x, y, z], dim=1)
    R = Rc.T
    poses_homo = torch.cat([poses, torch.as_tensor([[[0.,0.,0.,1.]]]).expand(poses.shape[0], -1, -1)], dim=1)
    inv_trans = torch.cat([torch.cat([R, torch.as_tensor([[0.,0.,0.]]).T], dim=1), torch.as_tensor([[0.,0.,0.,1.]])], dim=0)
    poses = (inv_trans @ poses_homo)[:,:3]
    pts = (inv_trans @ torch.cat([pts, torch.ones_like(pts[:,0:1])], dim=-1)[...,None])[:,:3,0]

    # translation and scaling
    poses_min, poses_max = poses[...,3].min(0)[0], poses[...,3].max(0)[0]
    pts_fg = pts[(poses_min[0] < pts[:,0]) & (pts[:,0] < poses_max[0]) & (poses_min[1] < pts[:,1]) & (pts[:,1] < poses_max[1])]
    center = get_center(pts_fg)
    tc = center.reshape(3, 1)
    t = -tc
    poses_homo = torch.cat([poses, torch.as_tensor([[[0.,0.,0.,1.]]]).expand(poses.shape[0], -1, -1)], dim=1)
    inv_trans = torch.cat([torch.cat([torch.eye(3), t], dim=1), torch.as_tensor([[0.,0.,0.,1.]])], dim=0)
    poses = (inv_trans @ poses_homo)[:,:3]
    scale = poses[...,3].norm(p=2, dim=-1).min()
    poses[...,3] /= scale
    pts = (inv_trans @ torch.cat([pts, torch.ones_like(pts[:,0:1])], dim=-1)[...,None])[:,:3,0]
    pts = pts / scale

Why did you change the normalization code to the current version? In particular, I wonder why you changed t=-tc to t=-Rc.T @ tc. In my humble opinion, the translation vector obtained through get_center is a translation vector in world space, not camera space.

Using the old version for normalization, I can get the following result with a tight bound!!: Screenshot from 2023-04-04 19-52-25

Thank you so much for solving the box artifacts issue!

bennyguo commented 1 year ago

Both transformations are correct:

In the current version, [R|t] performs first rotation then translation, but we need to first translate by -tc and then rotate by Rc.T. Therefore we have Rx + t = Rc.T(x - tc), leading to t = -Rc.T @ tc.
In the previous version, t is re-calculated as the center of points that are bounded by all cameras after rotation, so it doesn't need to be transformed as above.

I thought center_est_method=camera in the current version (treating the center of all camera positions as the scene center, not using point locations) approximately achieves the same effect as the old version, but it seems better to give another option such as center_est_method=points that does the same as the previous version :)

TLB-MISS commented 1 year ago

@bennyguo

Oh, you are right. The only problem with the current version of camera pose is that the return value here should actually be poses_norm, not poses.

bennyguo commented 1 year ago

@TLB-MISS Thanks for pointing this out! Such a stupid mistake😢 I already fixed this in the latest commit and brought back the old normalization method (dataset.center_est_method=point).

TLB-MISS commented 1 year ago

@bennyguo

Thank you for your hard work and dedication!! I think my issue could be closed now.

bennyguo / instant-nsr-pl

Questions about radius #55