Source of world-to-camera tf?

floatlazer / semantic_slam

Real time semantic slam in ROS with a hand held RGB-D camera

GNU General Public License v3.0

647 stars 179 forks source link

Source of world-to-camera tf? #4

Closed mfe7 closed 6 years ago

mfe7 commented 6 years ago

Hi,

Cool package! I was able to get it installed in a docker image to isolate the installation from the rest of my system - I can share the dockerfile if you'd like.

The demo.bag file provided works quite well - the chair in the /octomap_full is roughly the shape of a chair, for example.

But when I test on data I collected, the SLAM part of the code doesn't seem to be working correctly, since one object is getting spread all over my octomap as it moves around in the camera plane. The code requires me to publish a world-to-camera tf in order to visualize the octomap in rviz. This tf was part of the demo.bag - where did it come from? Is there some external odometry source? Is that being used for the calculation of the octomap as well, or just display? I would expect the SLAM to find correspondences to reduce the odometry's accumulating error.

I am also wondering if certain camera motions should be avoided, since in your demo the camera motion seems constrained - any tips?

A video can be found at this dropbox link. The camera/depth is coming from a gazebo simulation, by the way.

Thanks!

floatlazer commented 6 years ago

Hi,

Glad that you tried it out.

The tf information is used to construct the octomap. It gives the camera position for doing point cloud insertion into octomap.

The tf in the demo was recorded with the same SLAM system in the package. However, to have a better effect, it was recorded separately before hand without other nodes (e.g. octomap) running in parallel. To do this, I actually scanned the scene twice (moving around). For the first scan, I moved slowly and tried not to lose the tracking of SLAM system to let SLAM build a map (with loop closure). Then I did the second scan to record the tf of the camera. So the SLAM actually had a map well constructed. That's why is looked good.

To SLAM directly as visual odometry is possible but you should initialize it well with at least a small set of correctly mapped points. Try to move the camera more with horizontal or vertical movements and avoid pure rotations. Having a relatively far background with rich features could also help. You can move the camera around and check in the viewer of SLAM if the trajectory is reasonable. I noticed in your video there are a lot of rotations which are difficult for SLAM.

I didn't try any automated way to initialize the SLAM. I just monitored the viewer of SLAM each time I initialized the system. So I don't have many tips on how to ensure the initialization.

It would be cool if you could make a PR for your docker file.

Hope that could help. Cheers.

mfe7 commented 6 years ago

Thanks for the clarification. I was able to get good performance using the robot's wheel encoders + IMU as an odometry source, without needing to pre-map the environment or take much care to initialize the solution.

I was able to make a really accurate RGB octomap using tree_type=0, which I believe ignores the semantic part of the data. I will probably need to train an image segmentation model that's more suited to my domain, since I believe the inaccuracy in segmentation leads to a poor octomap.

In the meantime, it'd be awesome to create the octomap's geometry using just the RGB values, and then fill in the color of voxels with their semantic value instead of RGB value. That way I can be sure the octomap's geometry will be correct, even if some of the colors aren't fully accurate due to segmentation errors. Is that idea supported by this project? Any suggestions how I might do it?

Regarding the docker image, it's pretty messy since there was a bug in one of the ros-comm binaries, so I built ros from source in the dockerfile. If I can clean it up I will open a PR.

EDIT:

I'm pretty confused now... I was able to map the environment well with tree_type=0 without even running ORB_SLAM (just running octomap_generator and semantic_cloud.py):

semantic_slam

floatlazer commented 6 years ago

In fact, the geometry of the octomap is created based on the input point cloud of octomap generator. And the geometry of point cloud is generated based only on the depth information (given by the depth camera). All other color information is added at the same time to octomap, like RGB color or semantic color but they have nothing to do with the geometry of octomap.

So it won't change the geometry of the octomap whether you use RGB color or semantic color. In fact, RGB color is also stored in octomap even if you have created octomap with semantic color. You can switch to RBG color by calling rosservice call toggle_use_semantic_color.

Though theoretically using semantic color or RGB color may create a difference due to the pruning process in octomap (fuse children nodes together if they contain all the same information). But I think this rarely could happen and it won't have an influence on the overall geometry.

For now in this project ORB_SLAM is used only to give the camera position (but using its mapping would be definitely better). So it is totally ok if you use another source of camera position other than ORB_SLAM. Also, you may get a good construction even if your camera is static because the point cloud already contains the depth information necessary for the reconstruction. But of cause you only have the scene at one view point.

Xuan

mfe7 commented 6 years ago

Cool! The toggle_use_semantic_color service is a great idea.

When running the semantic_cloud.py script in semantic mode, there are large sections of some published pointclouds that are white. I believe this is caused by the time it takes to query the segmentation network (self.predict(color_img)) because if I simply add a time.sleep(0.2) before creating the RGB pointcloud (self.cloud_generator.generate_cloud_color(...)), the same behavior occurs. For some perspective, I have camera data coming in ~10Hz but each color_depth_callback takes 0.2sec, which I suspect is a problem.

EDIT: I'm now pretty sure the issue of white pixels comes from the time offset between my RGB image and depth image, which makes sense considering my background is white.

EDIT2: The white pixels almost completely disappear if I reduce the amount of time allowed between the two images' stamps (slop parameter in message_filters.ApproximateTimeSynchronizer)! I now have images coming in at ~50Hz and set slop=0.02.

floatlazer commented 6 years ago

Yes, I think it is probably caused by the slop parameter. The tolerance of ApproximateTimeSynchronizer may depend on the processing speed or sensor frequency. I think your machine may be more powerful than the machine I tested on. So it is possible that you should adjust it a bit. Maybe there is a better way to do the synchronization.