facebookresearch / home-robot

Mobile manipulation research tools for roboticists
MIT License
877 stars 121 forks source link

SparseVoxelMap _show_pytorch3d showing duplicate items from scene #501

Open yrraadi-io opened 4 months ago

yrraadi-io commented 4 months ago

I was playing around with creating my own custom SparseVoxelMapAgent and using the underlying SparseVoxelMap class. At the end of my own custom short episode (hssd scene 790) where i just take a few steps toward the object, I am using the _show_pytorch3d method to produce my voxel map 3d visualization and this is what it produces.

image

As you can see items are being duplicated and I'm unsure what the source of this issue is. Am I doing something wrong or is this a bug in the implementation in the repository? From my understanding, when the depth image is being lifted to world_xyz point representations, it's using the agent's camera_pose and camera_k, with the camera_pose changing as the agent moves around. So if the agent looks at the same object from different viewpoints, it should essentially map to the same location in 3D space without causing such duplication right?

yvsriram commented 4 months ago

Hi @yrraadi-io Awesome progress! I think we had to use this transform to get the pointclouds aligned

yrraadi-io commented 4 months ago

transform

Hi, thanks for sharing. Just to clarify I believe that transform is already being applied since when we self._env.reset the environment or call self._env.apply_action internally self._preprocess_obs(habitat_obs) is called which applies the tranformation to the camera_pose.

Are you suggesting we need to re-apply the same tranformation before calling the step method in my custom SparseVoxelMapAgent?

yvsriram commented 4 months ago

It uses convert_pose_to_real_world_axis. I was suggesting to try using use_opencv_camera_pose

yrraadi-io commented 4 months ago

that worked, thank you! Just out of curiosity why does this work and not the other tranformation?

image

yvsriram commented 4 months ago

Amazing! Just that sparse voxel code expects opencv's camera convention, different from what habitat uses and different from what the home-robot agent code expects.