floatlazer / semantic_slam

Real time semantic slam in ROS with a hand held RGB-D camera
GNU General Public License v3.0
612 stars 177 forks source link

Monocular Semantic Slam with ORB-SLAM2? #11

Open NicksonYap opened 5 years ago

NicksonYap commented 5 years ago

Hi,

ORB-SLAM2 supports monocular slam

I wonder if this code can be modified to use ORB-SLAM2's Monocular SLAM, instead of directly using RGB-D?

Thanks!

floatlazer commented 5 years ago

Hi,

Yes, it is possible. In fact it was what we intended to do. The problem is that the depth estimated by monocular SLAM is relative (i.e. not in meters but relative with the initial scale). So when the camera moves around in the real world, the reconstruction will be broken because the point cloud are generated with the real depth.

I think there are two solutions to work with monocular SLAM.

  1. Use depth prediction to estimate the real scale from RGB images, like CNN-SLAM. Then integrate the estimated depth into ORB-SLAM2 to adjust the scale. Our initial attempts could be found in branch depth_prediction or in the project report.

  2. Use the depth estimated by monocular SLAM and generate a point cloud with this depth information. The drawback is that you won't have the real scale unless you can calibrate it. And ORB-SLAM2 is feature based SLAM so the map is sparse. Depth completion may be necessary to build a complete surface.

Meanwhile, either of them needs a lot of work to be done.

Xuan

NicksonYap commented 5 years ago

I'm not a pro, but Is actual scale really required for Semantic Slam or Semantic Segmentation in general?

Say that we give up on single camera, Would multi camera (2 and above) help?

Or at least integration/support for lower cost / more widely used RGB-D sensors such as the Orbbec Astra Pro or Intel Realsense D435 (going for 180 USD now)

floatlazer commented 5 years ago

Stereo camera could work as ORB-SLAM2 supports stereo cameras.

Semantic segmentation is done based on RGB image. Depth information is only required for SLAM and reconstruction.

We used Asus xtion camera in our experiments, other low cost cameras should also work.

NicksonYap commented 5 years ago

@floatlazer I'm trying to buy the same sensor you're using. What's the exact model name of the ones used by you?

There is the regular Asus Xtion, Asus Xtion PRO Asus Xtion Live Asus Xtion PRO Live

All so confusing...

http://wiki.ipisoft.com/Depth_Sensors_Comparison#Xtion_Live_vs_Xtion_vs_Carmine (See the most bottom part)

PRO seems mean for Developers (same hardware but different software?) Live seems to mean it has an RGB sensor

Since RGB was used, I suppose you're using either the Asus Xtion Live or Asus Xtion PRO Live?

can you check if yours is "PRO" and whether it is needed (PRO is more costly)

NicksonYap commented 5 years ago

Created a new issue #12 regarding the sensor model

Please reply there, thanks!

N-G17 commented 4 years ago

@NicksonYap were you able to implement Monocular semantic SLAM?

NicksonYap commented 4 years ago

@Neetika-Gupta

Nope, did not give it a shot