charlesCXK / RGBD_Semantic_Segmentation_PyTorch

[ECCV 2020] PyTorch Implementation of some RGBD Semantic Segmentation models.
MIT License
301 stars 43 forks source link

Some questions. #11

Open jinseok-karl opened 3 years ago

jinseok-karl commented 3 years ago

Hello. First of all, I appreciate the sharing this repository. I am interested in Semantic Segmentation. The data that I'm trying to segment has additional depth information (stereo or Lidar). If I had this information, I thought it helps be better for the Boundary of Segment part. I'm looking for ways to utilize it. I have a question here. 1) Is there any difference between Depth assisted segmentation and 3D segmentation? If there is a difference, where is it close to making the Boundary of Segmentation by adding the depth information that I want? I think it's similar in general, but 3D segmentation always provides visual data that looks like 3D reconstruction and segmentation...

2) Models such as RDFnet, Malleable 2.5D Convolution, and SA-Gate require HHA format. It took a long time to change the depth image to HHA format. I'm not sure this is practical. Is this HHA format still useful?

I would appreciate it if you could answer me!

charlesCXK commented 3 years ago

Sorry for the late reply. (1) For the first question, I think the main difference between 2.5D Seg and 3D Seg is that 2.5D Seg takes 2D images as input, which could be processed by CNN, while the 3D data is very sparce and unordered. We could project the RGB image to 3D space with the help of the depth image and the camera parameters, but I think there is no need to do this, because we drop the original structure information in 2D space which could be well handled by CNN. By the way, researchers in 3D area usually try to design some operators to imitate the behavior of 2D convolutions and respective fields. (2) For the second question, converting Depth to HHA indeed takes some time. However, the HHA input is not necessary, I mean, HHA is just an embedding of depth images and we could directly use the depth images as input.