Kunhao-Liu / 3D-OVS

[NeurIPS 2023] Weakly Supervised 3D Open-vocabulary Segmentation
108 stars 4 forks source link

When will unbouded 360 scenes be supported? #2

Open Chuan-10 opened 1 year ago

Chuan-10 commented 1 year ago

Thank you for the excellent work! I noticed that there is a support for unbouded 360 scenes in TODOs, and when will it be finished? And if the support won't come quickly, could you give some tips about the support? There are some code slides will be better.

Kunhao-Liu commented 1 year ago

Hi, to support unbounded 360 scenes, we need to warp the world coordinates to a bounded range. One example is the contraction operation from MipNerf 360:

# contraction from mipnerf implementation
def contract(x):
    """
        Contracts points towards the origin (Eq 10 of arxiv.org/abs/2111.12077).
        Args:
            x: A tensor of shape [N, 3].   
    """
    eps = torch.tensor(1e-8)
    # Clamping to eps prevents non-finite gradients when x == 0.
    x_mag_sq = torch.maximum(eps, torch.sum(x**2, dim=-1, keepdim=True)) # [N, 1]
    z = torch.where(x_mag_sq <= 1, x, ((2 * torch.sqrt(x_mag_sq) - 1) / x_mag_sq) * x) # [N, 3]
    return z

which maps the world coordinates to [-2, 2].

The sampling of the world coordinates can be implemented as:

def sample_ray_contracted(self, rays_o, rays_d, is_train=True, N_samples=-1):
        '''
        We do not perform contraction here
        '''
        N_samples = N_samples if N_samples > 0 else self.nSamples
        near, far = self.near_far
        inner_N_samples = N_samples - N_samples // 2
        outer_N_samples = N_samples // 2
        # inner
        interpx_inner = (
            torch.linspace(near, 2.0, inner_N_samples + 1).unsqueeze(0).to(rays_o)
        )
        if is_train:
            interpx_inner[:, :-1] += (
                torch.rand_like(interpx_inner).to(rays_o)
                * ((2.0 - near) / inner_N_samples)
            )[:, :-1]
        interpx_inner = (interpx_inner[:, 1:] + interpx_inner[:, :-1]) * 0.5
        # sample outer
        rng = torch.arange(outer_N_samples + 1)[None].float()
        if is_train:
            rng[:, :-1] += (torch.rand_like(rng).to(rng))[:, :-1]
        rng = torch.flip(rng, [1])
        rng = (rng[:, 1:] + rng[:, :-1]) * 0.5
        interpx_outer = 1.0 / (
            1 / (far) + (1 / 2.0 - 1 / (far)) * rng / outer_N_samples
        ).to(rays_o.device)
        interpx = torch.cat((interpx_inner, interpx_outer), -1)

        rays_pts = rays_o[..., None, :] + rays_d[..., None, :] * interpx[..., None]

        mask_outbbox = torch.zeros_like(rays_pts[..., 0]) > 0 # every ray is valid
        return rays_pts, interpx, ~mask_outbbox

I haven't tested the performance in the unbounded scenes though, but I think one problem is the complex background which does not have a suitable text prompt to describe.

Chuan-10 commented 1 year ago

Thank you for the instructions! I will try it later and feed back. Besides, I have a little question that how to manually annotate the segments of test images, is there any tools or methods? Because it is an open vocabulary problem, how to choose the right prompts to segment? Thank you for the time!

Chuan-10 commented 1 year ago

Hi, sorry for bothering you. I wanna ask how to render test images in LERF, I noticed that you compared the LERF results in your paper. Can you give some advice?

Chuan-10 commented 1 year ago

Hi, to support unbounded 360 scenes, we need to warp the world coordinates to a bounded range. One example is the contraction operation from MipNerf 360:

# contraction from mipnerf implementation
def contract(x):
    """
        Contracts points towards the origin (Eq 10 of arxiv.org/abs/2111.12077).
        Args:
            x: A tensor of shape [N, 3].   
    """
    eps = torch.tensor(1e-8)
    # Clamping to eps prevents non-finite gradients when x == 0.
    x_mag_sq = torch.maximum(eps, torch.sum(x**2, dim=-1, keepdim=True)) # [N, 1]
    z = torch.where(x_mag_sq <= 1, x, ((2 * torch.sqrt(x_mag_sq) - 1) / x_mag_sq) * x) # [N, 3]
    return z

which maps the world coordinates to [-2, 2].

The sampling of the world coordinates can be implemented as:

def sample_ray_contracted(self, rays_o, rays_d, is_train=True, N_samples=-1):
        '''
        We do not perform contraction here
        '''
        N_samples = N_samples if N_samples > 0 else self.nSamples
        near, far = self.near_far
        inner_N_samples = N_samples - N_samples // 2
        outer_N_samples = N_samples // 2
        # inner
        interpx_inner = (
            torch.linspace(near, 2.0, inner_N_samples + 1).unsqueeze(0).to(rays_o)
        )
        if is_train:
            interpx_inner[:, :-1] += (
                torch.rand_like(interpx_inner).to(rays_o)
                * ((2.0 - near) / inner_N_samples)
            )[:, :-1]
        interpx_inner = (interpx_inner[:, 1:] + interpx_inner[:, :-1]) * 0.5
        # sample outer
        rng = torch.arange(outer_N_samples + 1)[None].float()
        if is_train:
            rng[:, :-1] += (torch.rand_like(rng).to(rng))[:, :-1]
        rng = torch.flip(rng, [1])
        rng = (rng[:, 1:] + rng[:, :-1]) * 0.5
        interpx_outer = 1.0 / (
            1 / (far) + (1 / 2.0 - 1 / (far)) * rng / outer_N_samples
        ).to(rays_o.device)
        interpx = torch.cat((interpx_inner, interpx_outer), -1)

        rays_pts = rays_o[..., None, :] + rays_d[..., None, :] * interpx[..., None]

        mask_outbbox = torch.zeros_like(rays_pts[..., 0]) > 0 # every ray is valid
        return rays_pts, interpx, ~mask_outbbox

I haven't tested the performance in the unbounded scenes though, but I think one problem is the complex background which does not have a suitable text prompt to describe.

Hi, I have tried the code, but I didn't get the right results, the train_psnr and mse were nan always. Can you give some more detailed instructions? Now I just modified it like this, and the dataset is 360v2.

# In class TensorBase
    def forward(self, rays_chunk, white_bg=True, is_train=False, ndc_ray=False, N_samples=-1):

        # sample points
        viewdirs = rays_chunk[:, 3:6]
        if ndc_ray:
            if self.is_360:
                xyz_sampled, z_vals, ray_valid = self.sample_ray_contracted(rays_chunk[:, :3], viewdirs, is_train=is_train,N_samples=N_samples)
                xyz_sampled = self.contract(xyz_sampled)
            else:
                xyz_sampled, z_vals, ray_valid = self.sample_ray_ndc(rays_chunk[:, :3], viewdirs, is_train=is_train,N_samples=N_samples)
            dists = torch.cat((z_vals[:, 1:] - z_vals[:, :-1], torch.zeros_like(z_vals[:, :1])), dim=-1)
            rays_norm = torch.norm(viewdirs, dim=-1, keepdim=True)
            dists = dists * rays_norm
            viewdirs = viewdirs / rays_norm
        else:
            xyz_sampled, z_vals, ray_valid = self.sample_ray(rays_chunk[:, :3], viewdirs, is_train=is_train,N_samples=N_samples)
            dists = torch.cat((z_vals[:, 1:] - z_vals[:, :-1], torch.zeros_like(z_vals[:, :1])), dim=-1)
        viewdirs = viewdirs.view(-1, 1, 3).expand(xyz_sampled.shape)

....

        if ray_valid.any():
            if self.is_360:
                xyz_sampled = (xyz_sampled + 2) / 4
            else:
                xyz_sampled = self.normalize_coord(xyz_sampled)
            sigma_feature = self.compute_densityfeature(xyz_sampled[ray_valid])

            validsigma = self.feature2density(sigma_feature)
            sigma[ray_valid] = validsigma
Kunhao-Liu commented 1 year ago

Hi, I think you should also use self.normalize_coord(xyz_sampled) when using 360 datasets. The nan may be due to the sampling coordinates being out of [-1,1].

For the segmentation annotation tools, you can refer to this link. For the prompt engineering, you can take a look at this section.

Chuan-10 commented 1 year ago

Thank you very much!

I have followed your advice to use self.normalize_coord(xyz_sampled), but i got nan again. My dataset is 360 v2, and i directly use the poses_bounds.npy in it. I used the llff format to read the data. And I found if using 360 dataset, the rays_norm and the dist is very large, like 1k. And I got the nan value from the first running of alpha, weight, bg_weight = raw2alpha(sigma, dists * self.distance_scale), specificly the step T = torch.cumprod(torch.cat([torch.ones(alpha.shape[0], 1).to(alpha.device), 1. - alpha + 1e-10], -1), -1) in the function raw2alpha. I found the there were nagetive values in dist, so the there were nagetive values in alpha too, which might make T have nan values.

Could you give me some instructions?