Question about expected semantic information gain.

Hymwgk commented 3 months ago

I have some questions regarding the description of "expected semantic information gain" in the paper:

It seems that $I_s(ξ)$is the information value that a new viewpoint can obtain on the old grid. Why is it called "gain"? Am I misunderstanding something?
$I_s(ξ)$does not seem to emphasize different classes, such as focusing on fruit nodes specifically. Instead, it aims to make the semantic information within the FoV more certain. Is my understanding correct?
I noticed that the ROI coverage is calculated, but it seems to be only for debugging purposes. The compute_gain function appears to calculate the global expected information gain rather than the local gain for the ROI. Could you point out which part of the code specifically emphasizes the ROI area?

Thank you in advance for you help.

akshaykburusa commented 3 months ago

From information theory, I_s(ξ) is the Shannon's entropy. In our case, we intuitively understand it as the amount of information that is still unknown about the voxels that would be visible from a viewpoint, and hence it is the information that could be gained from the viewpoint if the camera was moved there.
Yes, that is correct. But you can easily emphasize one particular class, if you like. For this, in the get_semantics function, you can assign different log-odds values to the classes that your segmentation algorithm has detected or simply ignore the classes that are not of interest.
That's true. We used a more subtle way to emphasize the ROI. During initialization of the voxel_grid, we assigned a semantic confidence value close to zero for all the voxels. However, only for the voxels within the ROI, we set the semantic confidence value to 0.5 (set_target_roi). This ensures that the compute_gain function gives preference to voxels within the ROI, because the voxels outside the ROI have very less information. Alternately, you may directly use the ROI values when computing the gain, for example, by modifying the following lines in the compute_gain function:
```
    ray_points_nor = self.normalize_3d_coordinate(ray_points)
    ray_points_nor = ray_points_nor.view(1, -1, 1, 1, 3).repeat(3, 1, 1, 1, 1)
    # Sample the occupancy probabilities and semantic confidences along each ray
    grid = self.voxel_grid[None, ..., 0:3].permute(4, 0, 1, 2, 3)
    occ_sem_confs = F.grid_sample(grid, ray_points_nor, align_corners=True)
    occ_sem_confs = occ_sem_confs.view(3, -1, self.num_pts_per_ray)
    occ_sem_confs = occ_sem_confs.clamp(self.eps, 1.0 - self.eps)
    # Compute the entropy of the semantic confidences along each ray
    opacities = torch.sigmoid(1e7 * (occ_sem_confs[1, ...] - 0.51))
    transmittance = self.shifted_cumprod(1.0 - opacities)
    ray_gains = (
        transmittance * self.entropy(occ_sem_confs[2, ...]) * occ_sem_confs[0, ...]
    )
```
The above lines interpolate the ROI values along the sampled points and then multiply them with the expected gain values. This should effectively make all the points that are not within the ROI zero. However, I have not tested how effective these changes are. Please check.

Hymwgk commented 3 months ago

Thanks again : )

akshaykburusa / gradientnbv

Question about expected semantic information gain. #2