why cumsum the prob to render the depth

Lazyangel commented 3 months ago

Hello author, I'm not quite familiar with NeRF volume rendering. Could you explain why probability accumulation is summed up here to render depth? What is the corresponding mathematical formula for this process? What is the corresponding physical meaning? in my opinion，cumulative multiplication might make more sense?

def get_density(self, rays_o, rays_d, Voxel_feat, is_train, inputs):

        eps_time = time.time()
        with torch.no_grad():
            rays_o_i = rays_o[0, ...].flatten(0, 2)  # HXWX3
            rays_d_i = rays_d[0, ...].flatten(0, 2)  # HXWX3
            rays_pts, mask_outbbox, interval, rays_pts_depth = self.sample_ray(rays_o_i, rays_d_i, is_train=is_train)

        mask_rays_pts = rays_pts[~mask_outbbox]
        density = self.grid_sampler(mask_rays_pts, Voxel_feat) # 256,256,16

        if self.opt.render_type == 'prob':
            probs = torch.zeros_like(rays_pts[..., 0])
            probs[:, -1] = 1
            density = torch.sigmoid(density)
            probs[~mask_outbbox] = density

            # accumulate
            probs = probs.cumsum(dim=1).clamp(max=1)
            probs = probs.diff(dim=1, prepend=torch.zeros((rays_pts.shape[:1])).unsqueeze(1).to('cuda'))
            depth = (probs * interval).sum(-1)
            rgb_marched = 0

GANWANSHUI commented 3 months ago

Hi, the probability representation is a little bit different from the density in NeRF. Here, we need to make sure the sum of the probability along the ray is 1. So we have the sigmoid for each sampled point. then, we do the accumulated sum along the ray, then we do the clamp operation with the max value 1. Then, do the diff subtraction from the endpoint to the initial point along the ray. At last, we could obtain the probability distribution for each point along the ray.

Lazyangel commented 3 months ago

Thank you for your quick reply! But I want to explain my opinion further，Suppose I sample four points along a ray, each with occupation probabilities of [0.4, 0.6, 0.3, 0.7].If we do cumsum and diff here，we'll get [0.4, 0.6, 0, 0], which means we'll stop at the second point. But in my opinion, the probability that we reach the 3rd point should be (1-0.4)*(1-0.6) = 0.24 instead of 0, just like NeRF. So I think it is more reasonable to do cumprod or softmax here.What went wrong with this process? It would be helpful to me if you could explain it! Looking forward to your reply!

GANWANSHUI commented 3 months ago

Thanks for the discussion. The cumprod or softmax operation you described should be workable, just like nerf. You can regard our implementation as a simplification one, which does not consider the point after the accumulated probability is larger than 1, and makes sure the probability is from 0~1. When obtaining the depth value, these points, before the 0 probability, could already satisfy the training, no matter whether the predicted depth value is larger or smaller than the ground truth depth.

Lazyangel commented 3 months ago

Thanks for your reply! I will do some experiments later.

GANWANSHUI / SimpleOccupancy

why cumsum the prob to render the depth #16