anthonysimeonov / ndf_robot

Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"
MIT License
215 stars 34 forks source link

Question about the paper: the translation equivariance coupled with rotation equivariance #1

Closed ray8828 closed 2 years ago

ray8828 commented 2 years ago

Hi, thanks for sharing this great and interesting work!

I'm a bit curious about how partial is the input observed point cloud, as I understand from the paper, the translation equivariance is achieved through subtracting the centre of mass, but this really depends on how complete the point cloud is. If it's too partial, the centre of mass will largely shift from the actual object centre. Since the translation and rotation equivariance are always coupled, from the vector neuron perspective, it is learning the representation with centre shift augmentation and might lead to rotation equivariance error.

Thanks!

Guptajakala commented 2 years ago

@ray8828 from what I understand, they mentioned in the experiments section that they are using 4 rgbd cameras each at a corner on the table. So basically they have full observation of the novel object.

anthonysimeonov commented 2 years ago

Thanks for the great question, yeah as @Guptajakala points out in all our experiments we use four cameras and obtain a relatively complete point cloud. We have observed qualitatively that the method can handle partial occlusions pretty well as long as those types of partial point clouds are also included in the training set, but we didn't include a whole analysis of this aspect in this initial study. However, we do know that if all the training is done on point clouds obtained with 4 cameras and testing is now done on point clouds from e.g. 1 camera, then the performance does drop, as this leads to a large distribution shift for the point cloud encoder.