flexible-collision-library / fcl

Flexible Collision Library
https://flexible-collision-library.github.io/
Other
1.38k stars 417 forks source link

Segfault with null octomap root node #473

Open JStech opened 4 years ago

JStech commented 4 years ago

After clearing an octomap, the pointer to the root node is set to null. This is causing MoveIt to crash during collision checking: ros-planning/moveit#2104. The crashes were happening in various recursive functions in include/fcl/narrowphase/detail/traversal/octree/octree_solver-inl.h. (I started trying to fix them, but I was unsure how to handle them properly.)

This comment in the octomap implementation suggests that an empty root node used to be created by default, but no longer is.

SeanCurtis-TRI commented 4 years ago

Blech....this is the issue with shared ownership.

It is clear that octomap has intended clear() to represent "nuke the octree from orbit". And given the fact that fcl::OcTree doesn't exclusively own its tree, it would be straightforward for some code external to fcl to clear the shared tree and fcl will be ignorant of that fact.

There are two use constructors on fcl::OcTree:

In the first case, the fcl::OcTree uniquely owns its OcTree and the case you're describing can't happen (as the OcTree isn't leaked in the API, so there is no ability to call clear() on it).

The second case leaves the door open for this case. I presume, that's the constructor you're using.

So, fcl::OcTree needs a guard in the second case that will detect an uninitialized tree and throw on operations.

How does that sound to you? Detect and throw? I can't imagine there's any better option in the shared case -- fcl can't know the semantics of a tree that was instantiated and managed outside of itself.

SeanCurtis-TRI commented 4 years ago

After digging further -- the first constructor seems to largely be useless. While it instantiates the tree, there's no API for populating the tree. So, I presume that all uses of fcl::OcTree use the shared octomap::OcTree interface.

This makes me wonder who is calling clear() and to what purpose? Does the caller realize that they're nuking the tree and making anything dependent on it disfunctional?

JStech commented 4 years ago

The offending call to clear() is here. I'm calling this function directly, rather than through the service, but the service doesn't seem to do anything extra that would prevent this.

SeanCurtis-TRI commented 4 years ago

It's difficult to know what to make of it. The function in the planning_scene_monitor was added six years ago. But the occupancy monitor it exercises as masked history as the files were moved from a previous location to their current location. I've never looked at the moveit code before and really don't want to spend more time delving into it.

The question I was hoping to answer was: does the move it code know that at some point octomap changed the semantics of its clear method? I didn't see anything in the occupancy_map or its monitor that suggests any explicit action was taken protecting itself from a clear.

I'm curious, if you do all of the same moveit operations as you do now, but omit the FCL, does moveit give you a segfault after clearing the occupancy map? I wonder if it would but it doesn't get a chance to because FCL hits its null-pointer dereference first.

Either way, this clearly represents a workflow that FCL should protect itself against.

SeanCurtis-TRI commented 4 years ago

Proposed solution

(This is a summary from suggestions above)

FCL's wrapping of octomap::OcTree is limited. Essentially, the only practical way to use it is to pass in a shared octomap::OcTree. That means code external to FCL has full ability to clear the octomap::OcTree. As shown here octomap::OcTree::clear() leaves the root node set to NULL after a clear operation. FCL doesn't guard against this.

We'll add a safety test to the fcl::OcTree that is executed prior to any access of the tree root. It will essentially confirm it is non-null or throw an exception. This is a net improvement of an inscrutable segfault due to the inevitable null-pointer exception and the message can allude to an unpopulated or cleared OcTree providing some guidance to the caller.

Possible alternative

One might argue that an empty OcTree could be considered an empty geometry such that distance is infinite and collisions are impossible. Is this desirable?

tylerjw commented 4 years ago

You could maybe use a gsl not_null attribute to that pointer type which would throw when someone sets it to null. https://www.bfilipek.com/2017/10/notnull.html

JStech commented 4 years ago

That sounds reasonable. My MoveIt workaround skips the FCL call if the root is null: https://github.com/ros-planning/moveit/pull/2104/files

So far, at least, I've not hit any other segfaults with the octomap with this workaround. A lot of the MoveIt octomap stuff lacks tests, though, so it's hard to say whether it's really safe. I'm working on implementing some more tests.

Both of your proposed solutions sound good to me. Since the semantics of clear are, ironically, unclear, throwing an exception seems reasonable.

JStech commented 4 years ago

A bit more background that I discovered while writing a MoveIt test for this today. Apparently this segfault only happens when doing a collision distance check after clearing the octomap. I had a hard time believing that no one has cleared an octomap in MoveIt in the past 6 years, but I can believe that no one has requested a collision distance after clearing an octomap.

SeanCurtis-TRI commented 4 years ago

My cursory search though moveit (read as very cursory) didn't make any post-clear() guards in Moveit jump out to me. Do you have a sense what Moveit does to protect itself from using a cleared OcTree?

SeanCurtis-TRI commented 4 years ago

@tylerjw A nice suggestion. In this case, it's out of FCL's hands. FCL has a non-null OcTree whose root node gets set to nullptr by some action outside of FCL. So, it doesn't own the pointer in question and can't prevent the operation. The best it can do is make sure it doesn't assume it's non-null.

tahsinkose commented 1 month ago

Reviving the thread after more than 4 years! @SeanCurtis-TRI MoveIt employs collision_detection::OccMapTree, which is exactly a wrapper of octomap::OcTree that implements thread-safe r/w access. The rule of thumb is to use this wrapper whenever an octomap::OcTree related usage is needed. Nevertheless, this convention is violated in a FCL-consumer suite of utilities of MoveIt -> collision_common.cpp

fcl is a different software package, which did not see that as necessary, possibly because there wasn't a need for multi-threaded use of octomap::OcTree data structure. Even if a very similar wrapper to collision_detection::OccMapTree is implemented in fcl, it wouldn't suffice for MoveIt, since it wouldn't be the same lock. Therefore, any usage of fcl::OcTree* classes is not thread-safe in MoveIt landscape.