akshaykburusa / gradientnbv

Gradient-based Next-best-view Planning
3 stars 0 forks source link

Question about updating occupancy probability #1

Closed Hymwgk closed 3 months ago

Hymwgk commented 3 months ago

Hi, thank you for sharing your code with us. It has been very helpful, but I found some parts a bit confusing. Specifically, what does the operation ray_occ[:, -2:, :] mean? Does it indicate that you want to update the occupancy probability of the last two points of each ray if they are farther than the z_far value?

Additionally, it seems that the input transforms already provides the relationship between the 'camera_frame' and the 'world_frame,' and the 'points' are also related to the 'camera_frame.' Given this, why is the T_oc still needed?

def transform_points(
        self,
        points: torch.tensor,
        transforms: torch.tensor,
    ) -> torch.tensor:
        """
        Transform a point cloud from 'camera_frame' to 'world_frame'
        :param points: point cloud
        :param transforms: transformation matrices
        """
        points[..., 1] += 0.024  # TODO: remove this hack, only for gazebo
        T_oc = self.T_oc.clone().requires_grad_()
        T_cws = transforms.clone().to(torch.float32).requires_grad_()
        T_ows = T_cws @ T_oc
        points_h = nn.functional.pad(points, (0, 1), "constant", 1.0)
        points_w = points_h @ T_ows.permute(0, 2, 1)
        return points_w[:, :, :3]
akshaykburusa commented 3 months ago

Hi! Thank you for your interest in our work. Regarding the first part of your question, the part of the code you are referring to is updating the occupancy probabilities along the entire ray. The term self.ray_occ is first initialized under the assumption that all points along the ray are free. However, when the ray hits an object, all points along the ray are free except for the last point. Hence, the log odds of the last points along each ray are modified based on whether they hit an object or not. This is done by the operation ray_occ[:, -2:, :]. Here, we updated the last two points instead of only the last point to dilate the object a bit. Without the dilation, we sometimes notice that the compute_gain operation can pass through obstacles. This is a limitation of the uniform ray sampling operation and needs to be improved.

akshaykburusa commented 3 months ago

Regarding the second part of your question about transforms, you are correct. According to general convention, there should not be a need for an additional transformation with T_oc. This was necessary specifically for our case because we defined transforms a bit differently. In our case, transforms refers to the transformation between the 'camera_color_frame' and the 'world_frame'. And T_oc is a fixed transformation between the 'camera_optical_frame' and the 'camera_color_frame'. Hence, to transform the points from the optical frame to the world frame, the additional transformation T_oc was needed. I can try to modify the code to fit the general convention for transforms, so that it directly refers to the transformation between the 'camera_optical_frame' to the 'world_frame'. However, I might not make these changes anytime soon. For now, please note that all mentions of a camera viewpoint or pose refer to the transformation between the 'camera_color_frame' and the 'world_frame'. In our setup, the camera_color_frame is a coordinate frame for the camera with X pointing along the lens of the camera and Z pointing up, while the camera_optical_frame is with Z pointing along the lens and Y pointing down. You can check the coordinate frames using the TF topic in Rviz. Hope this helps!

Hymwgk commented 3 months ago

Thank you very much for your patient explanation ! I really appreciate it : )