Closed TLB-MISS closed 1 year ago
Hi! We could use a rectangular bounding box instead of a square one :) The modification should be minor. I'll try to implement this when I have time.
sphere_init_radius
corresponds to the sphere initialization widely used in SDF-based methods. We initialize the SDF to be a sphere for better convergence, and sphere_init_radius
is the radius of this sphere relative to radius
(i.e., setting sphere_init_radius=1.0
should have an initialized sphere in the size of radius
).
@bennyguo
Thank you for your quick reply! I have an another question related with "box artifacts". Like this issue, There are box artifacts when we train a scene without any mask. Do you know any idea about removing this?
Thanks.
Are you training on data with monochromatic background (like NeRF-Synthetic) or on real-captured data (like MipNeRF-360)? You may refer to the "Floater problem" section here and see if it helps.
@bennyguo
I'm training on CO3D dataset which is real video frames dataset so the background is not a solid color. But thank you for your kind reply!
@TLB-MISS Could you try the latest code and tune model.radius
to tightly bound the object?
@bennyguo
The latest version doesn't seem to detect the camera information properly.
ADD: previous version's result
Are you capturing this from forward-facing viewpoints? There are two methods that can be used the determine the scene center in the latest version: https://github.com/bennyguo/instant-nsr-pl/blob/48145751245641b5004068060dab1dbb864857ca/datasets/colmap.py#L30-L43
dataset.center_est_method=camera
: regard the center of all cameras as the scene centerdataset.center_est_method=lookat
: assumes the cameras are looking at an object and calculates an approximate look-at point as the scene center
I believe neither method is very suitable for forward-facing captures. Maybe I should add another option that takes the center of the sparse point cloud as the scene center? Is it possible to provide the data you used for testing? Thanks!@bennyguo
The problem above has nothing to do with center estimation methods(camera, lookat, original get_center). This is due to a new pose normalization process.
I changed the code below to the old version and the problem was solved. Latest code
# rotation and translation
Rc = torch.stack([x, y, z], dim=1)
tc = center.reshape(3, 1)
R, t = Rc.T, -Rc.T @ tc
poses_homo = torch.cat([poses, torch.as_tensor([[[0.,0.,0.,1.]]]).expand(poses.shape[0], -1, -1)], dim=1)
inv_trans = torch.cat([torch.cat([R, t], dim=1), torch.as_tensor([[0.,0.,0.,1.]])], dim=0)
poses_norm = (inv_trans @ poses_homo)[:,:3] # (N_images, 4, 4)
# scaling
scale = poses_norm[...,3].norm(p=2, dim=-1).min()
poses_norm[...,3] /= scale
# apply the transformation to the point cloud
pts = (inv_trans @ torch.cat([pts, torch.ones_like(pts[:,0:1])], dim=-1)[...,None])[:,:3,0]
pts = pts / scale
Previous code
# rotation
Rc = torch.stack([x, y, z], dim=1)
R = Rc.T
poses_homo = torch.cat([poses, torch.as_tensor([[[0.,0.,0.,1.]]]).expand(poses.shape[0], -1, -1)], dim=1)
inv_trans = torch.cat([torch.cat([R, torch.as_tensor([[0.,0.,0.]]).T], dim=1), torch.as_tensor([[0.,0.,0.,1.]])], dim=0)
poses = (inv_trans @ poses_homo)[:,:3]
pts = (inv_trans @ torch.cat([pts, torch.ones_like(pts[:,0:1])], dim=-1)[...,None])[:,:3,0]
# translation and scaling
poses_min, poses_max = poses[...,3].min(0)[0], poses[...,3].max(0)[0]
pts_fg = pts[(poses_min[0] < pts[:,0]) & (pts[:,0] < poses_max[0]) & (poses_min[1] < pts[:,1]) & (pts[:,1] < poses_max[1])]
center = get_center(pts_fg)
tc = center.reshape(3, 1)
t = -tc
poses_homo = torch.cat([poses, torch.as_tensor([[[0.,0.,0.,1.]]]).expand(poses.shape[0], -1, -1)], dim=1)
inv_trans = torch.cat([torch.cat([torch.eye(3), t], dim=1), torch.as_tensor([[0.,0.,0.,1.]])], dim=0)
poses = (inv_trans @ poses_homo)[:,:3]
scale = poses[...,3].norm(p=2, dim=-1).min()
poses[...,3] /= scale
pts = (inv_trans @ torch.cat([pts, torch.ones_like(pts[:,0:1])], dim=-1)[...,None])[:,:3,0]
pts = pts / scale
Why did you change the normalization code to the current version?
In particular, I wonder why you changed t=-tc
to t=-Rc.T @ tc
.
In my humble opinion, the translation vector obtained through get_center
is a translation vector in world space, not camera space.
Using the old version for normalization, I can get the following result with a tight bound!!:
Thank you so much for solving the box artifacts issue!
Both transformations are correct:
[R|t]
performs first rotation then translation, but we need to first translate by -tc
and then rotate by Rc.T
. Therefore we have Rx + t = Rc.T(x - tc)
, leading to t = -Rc.T @ tc
.t
is re-calculated as the center of points that are bounded by all cameras after rotation, so it doesn't need to be transformed as above.I thought center_est_method=camera
in the current version (treating the center of all camera positions as the scene center, not using point locations) approximately achieves the same effect as the old version, but it seems better to give another option such as center_est_method=points
that does the same as the previous version :)
@bennyguo
Oh, you are right.
The only problem with the current version of camera pose is that the return value here should actually be poses_norm
, not poses
.
@TLB-MISS Thanks for pointing this out! Such a stupid mistakeš¢ I already fixed this in the latest commit and brought back the old normalization method (dataset.center_est_method=point
).
@bennyguo
Thank you for your hard work and dedication!! I think my issue could be closed now.
First of all, thank you so much for providing such wonderful work.
I have two questions. First, what is
sphere_init_radius
in neus? I wonder what exactly this role does, and how it relates toradius
. The neus implementation here only provides a square bounding box (as usingradius
). Why would not provide a rectangular bounding box? Are there any issues with that version of implementation?Thanks!