MIT-SPARK / Kimera-Semantics

Real-Time 3D Semantic Reconstruction from 2D data
BSD 2-Clause "Simplified" License
633 stars 134 forks source link

semantic segmentation of realsense D435i live data #30

Closed gkiri closed 3 years ago

gkiri commented 4 years ago

Hi Toni,

I'm evaluating kimera with realsense D435i hardware.Currently, I have managed to run 3D geometry construction with live data from D435i.

Now I am trying to figure out how to semantically segment live feed of relasense D435i in real time,and get an output like you have shown in the kimera-semantics tesse dataset.

please let me know the procedure to follow to achieve that with on D435i live data .

AndreV84 commented 3 years ago

@gkiri could you extend how to setup Kimera to be used with d435i to run 3D geometry construction with live data from D435i, please?

gaotao19890725 commented 3 years ago

Dear gkiri, I am trying to connect Kimera-Semantics with realsense D435i in order to deal with real online data, but I am a new person is this erea, can you give me some suggestions, thanks a million.

zhaozhongch commented 3 years ago

Kimera cannot do semantic segmentation. You need find a semantic segmentation system, input your image to that system and get the output (segmented image) and send it to kimera semantics. If you want to just 3d reconstruction from realsense live data, I can offer my own experience. First, you need to input the real sense stereo image to a SLAM system like Kimera-VIO, VINS-Fusion, ORBSLAM2/3... Those systems will output 6d pose estimation. Then let the output pose estimation topic be /tesse/odom At the same time, use the realsense left image + depth or rgb image (aligned to depth) + depth as input to the Kimera. Write an image subscribe and publish program so that you can modify all the image frame id to left_cam. Say you use realsense's grayscale stereo image and input them to VINS-Fusion and can get 6D pose estimation. And use rgb image for the Kimera semantics. The rgb image topic is /rgb and depth image is /depth0. Then you simply need to modify the original launch file from

...
  <!-- If you want to play directly from a rosbag -->
  <arg name="play_bag" default="false"/>
  <!-- Let's go at twice real-time to show the speed! It can go faster, but then rviz struggles. -->
  <arg name="rosbag_rate" default="2.0"/>
  <arg name="bag_file" default="$(find kimera_semantics_ros)/rosbags/kimera_semantics_demo.bag"/>
  <node name="player" pkg="rosbag" type="play" output="screen"
    args="--clock --rate $(arg rosbag_rate) $(arg bag_file)"  if="$(arg play_bag)">
    <!-- The rosbag we first generated did not follow ROS naming standards for image topics,
         so we remap the topics accordingly here.-->
    <remap from="/tesse/left_cam" to="/tesse/left_cam/image_raw"/>
    <remap from="/tesse/segmentation" to="/tesse/segmentation/image_raw"/>
    <remap from="/tesse/depth" to="/tesse/depth/image_raw"/>
  </node>

  <!-- If you just want to run 3D reconstruction without semantics, set this flag to false-->
  <arg name="metric_semantic_reconstruction" default="true"/>

  <!-- Input -->
  <arg name="semantic_pointcloud"         default="/semantic_pointcloud"/>
  <arg name="left_cam_info_topic"         default="/tesse/left_cam/camera_info"/>
  <arg name="right_cam_info_topic"        default="/tesse/right_cam/camera_info"/>
  <arg name="left_cam_topic"              default="/tesse/left_cam/image_raw"/>
  <arg name="right_cam_topic"             default="/tesse/right_cam/image_raw"/>
  <arg name="left_cam_segmentation_topic" default="/tesse/segmentation/image_raw"/>
  <arg name="left_cam_depth_topic"        default="/tesse/depth/image_raw"/>
  <arg name="use_freespace_pointcloud"    default="false" />
  <arg name="freespace_pointcloud"        default="/dev/null"/>
...

to

  <arg name="sensor_frame" default="left_cam"/>

  <remap from="/tesse/left_cam" to="/rgb"/>
  <remap from="/tesse/segmentation" to="/instance_image0"/>
  <remap from="/tesse/depth" to="/depth0"/>

  <!-- If you just want to run 3D reconstruction without semantics, set this flag to false-->
  <arg name="metric_semantic_reconstruction" default="true"/>

  <!-- Input -->
  <arg name="semantic_pointcloud"         default="/semantic_pointcloud"/>
  <arg name="left_cam_info_topic"         default="/cam_info0"/>
  <arg name="right_cam_info_topic"        default="/tesse/right_cam/camera_info"/>
  <arg name="left_cam_topic"              default="/rgb"/>
  <arg name="right_cam_topic"             default="/tesse/right_cam/image_raw"/>
  <arg name="left_cam_segmentation_topic" default="/instance_image0"/>
  <arg name="left_cam_depth_topic"        default="/depth0"/>
  <arg name="use_freespace_pointcloud"    default="false" />
  <arg name="freespace_pointcloud"        default="/dev/null"/>