borglab / gtsam

GTSAM is a library of C++ classes that implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices.
http://gtsam.org
Other
2.65k stars 774 forks source link

ValuesIncorrectType is thrown even when type_info names for value are same #254

Open ruffsl opened 4 years ago

ruffsl commented 4 years ago

Description

As originally tracked in https://github.com/CogRob/omnimapper/issues/26 , it seems that we're encountering segfaults when a ValuesIncorrectType error is thrown even when name valudes for type_infos are the same.

[omnimapper_ros_node-1] BoundedPlanePlugin: Creating new plane 8070450532247928832: 0.769868 -0.097762 -0.630671 0.699390
[omnimapper_ros_node-1] Adding new symbol: p0
[omnimapper_ros_node-1] terminate called after throwing an instance of 'gtsam::ValuesIncorrectType'
[omnimapper_ros_node-1]   what():  Attempting to retrieve value with key "p0", type stored in Values is N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE but requested type was N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE

Steps to reproduce

See run instructions from docker demo described in the project readme: https://github.com/CogRob/omnimapper_ros/blob/plugin/planes/README.md

A minimal ros2bag that can induce the segfault can be found here: https://drive.google.com/file/d/1cVxsfCp2LSuRGkaUUA-bOC2fhhts1Cyo

$ ros2 bag play ./rosbag2_2020_...
$ docker run -it --rm \
  --net=host --ipc=host --pid=host" \
  cogrob/omnimapper_ros \
  ros2 launch omnimapper_ros realsense.launch.xml \
    use_rosbag:=true

Stacktrace:

``` __GI_raise 0x00007fb4c71eae97 __GI_abort 0x00007fb4c71ec801 0x00007fb4c7841957 0x00007fb4c7847ae6 std::terminate() 0x00007fb4c7847b21 std::rethrow_exception(std::__exception_ptr::exception_ptr) 0x00007fb4c7847ada 0x00007fb4c7b5fd0c 0x00007fb4c7b64e9b 0x00007fb4c7b61790 gtsam::NonlinearFactorGraph::linearize(gtsam::Values const&) const 0x00007fb4c84c1f8c gtsam::ISAM2::update(gtsam::NonlinearFactorGraph const&, gtsam::Values const&, gtsam::ISAM2UpdateParams const&) 0x00007fb4c848dbc4 gtsam::ISAM2::update(gtsam::NonlinearFactorGraph const&, gtsam::Values const&, std::vector > const&, boost::optional > const&, boost::optional > const&, boost::optional > const&, bool) 0x00007fb4c848a67c omnimapper::OmniMapperBase::optimize omnimapper_base.cpp:467 omnimapper::OmniMapperBase::spinOnce omnimapper_base.cpp:657 omnimapper::OmniMapperBase::spin omnimapper_base.cpp:640 boost::_mfi::mf0::operator() mem_fn_template.hpp:49 boost::_bi::list1 >::operator(), boost::_bi::list0> bind.hpp:259 boost::_bi::bind_t, boost::_bi::list1 > >::operator() bind.hpp:1294 boost::detail::thread_data, boost::_bi::list1 > > >::run thread.hpp:116 0x00007fb4c7d8bbcd start_thread 0x00007fb4c99806db clone 0x00007fb4c72cd88f ```

Relevant code path:

https://github.com/borglab/gtsam/blob/f538d1dc7bdd7126cb683f3e961c985f76a872b0/gtsam/nonlinear/Values.cpp#L160-L163

https://github.com/borglab/gtsam/blob/16dbf27375fdefb83ce2355beb0c147fa9c07600/gtsam/nonlinear/Values.cpp#L233-L240

Expected behavior

When calling update for new factors with the same type, a ValuesIncorrectType error is not expected.

gtsam::ISAM2Result result = isam2.update(new_factors, new_values);

https://github.com/CogRob/omnimapper/blob/f215e35a134c54b9acfebe247bd9e42e5e3f033b/src/omnimapper_base.cpp#L467

Environment

OS: Ubuntu 18.04 Lang: C++ 14

$ uname -a
Linux ubuntu 5.3.0-42-generic #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27

$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

Exact checkout of gtsam: https://github.com/CogRob/omnimapper_ros/blob/a17c4017dfde4cb022384e8d8f521e2ab05197fb/install/underlay/underlay.repos#L2-L5

Exact build setup https://github.com/CogRob/omnimapper_ros/blob/a17c4017dfde4cb022384e8d8f521e2ab05197fb/Dockerfile#L55-L66

Additional information

I'd like to get a better stack trace, but there may be a separate issue in using TBB downstream when compiling from gtsam debug builds. Perhaps gtsam skips finding TBB when debug flags are set? See: https://github.com/CogRob/omnimapper/issues/25

ProfFan commented 4 years ago

Could be the RTTI demon being summoned again... Anyway it is better to NOT use TBB if you want to debug something.

Could you do the following:

Add this to Line 161:

std::cout << "Wanted: " << typeid(old_value).name()
        << ", " << typeid(old_value).hash_code() << std::endl;
std::cout << "Actual: " << typeid(val).name() << ", " << typeid(val).hash_code() << std::endl;

And compile, run, post the result and your compiler (GCC/Clang) version.

It is really appreciated that you also post the GLibC version and your OS version :)

ruffsl commented 4 years ago

Add this to Line 161:

I had some issues flushing to stdout, so I used the message around line 233 like so:

  /* ************************************************************************* */
  const char* ValuesIncorrectType::what() const throw() {
    if(message_.empty())
    {
      message_ =
          "Attempting to retrieve value with key \"" + DefaultKeyFormatter(key_) + "\", type stored in Values is " +
          std::string(storedTypeId_.name()) + " but requested type was " + std::string(requestedTypeId_.name());
      message_ +=
          "\nWanted: " +  std::string(storedTypeId_.name()) + ", " +  string_format("%zx", storedTypeId_.hash_code());
      message_ +=
          "\nActual: " +  std::string(requestedTypeId_.name()) + ", " +  string_format("%zx", requestedTypeId_.hash_code());
    }
    return message_.c_str();
  }

// string_format() from https://stackoverflow.com/a/26221725/2577586

It looks like hash_code returns the same (hex) values:

[omnimapper_ros_node-1] BoundedPlanePlugin: Creating new plane 8070450532247928832: -0.105721 0.939853 -0.324807 1.457075
[omnimapper_ros_node-1] Adding new symbol: p0
[omnimapper_ros_node-1] terminate called after throwing an instance of 'gtsam::ValuesIncorrectType'
[omnimapper_ros_node-1]   what():  Attempting to retrieve value with key "p0", type stored in Values is N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE but requested type was N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE
[omnimapper_ros_node-1] Wanted: N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE, 82781d0d761147f5
[omnimapper_ros_node-1] Actual: N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE, 82781d0d761147f5
[ERROR] [omnimapper_ros_node-1]: process has died [pid 17776, exit code -6, cmd '/

post the result and your compiler (GCC/Clang) version. It is really appreciated that you also post the GLibC version and your OS version :)

See update under environment in top post.

ProfFan commented 4 years ago

Note this is incorrect place to add the printouts. You should add these in Values-inl.h because these type conversions happen there I think.

ruffsl commented 4 years ago

You should add these in Values-inl.h because these type conversions happen there I think.

Do you mean here:

https://github.com/borglab/gtsam/blob/ca4daa0894abb6e979384302075d5fc3b61119f3/gtsam/nonlinear/Values-inl.h#L273-L285

It looks like ValuesIncorrectType is thrown using the same type info as my diff above would print. Could you post an exact git patch you'd like me to apply to Values-inl.h and test on my end?

ruffsl commented 4 years ago

By just commenting out the try and catch around the dynamic_cast, I captured the segfault directly:

``` __GI_raise 0x00007f3b957bde97 __GI_abort 0x00007f3b957bf801 0x00007f3b95e14957 0x00007f3b95e1aae6 std::terminate() 0x00007f3b95e1ab21 __cxa_throw 0x00007f3b95e1ad54 __cxa_bad_cast 0x00007f3b95e19a82 gtsam::internal::handle >::operator() Values-inl.h:280 gtsam::Values::at > Values-inl.h:353 gtsam::NoiseModelFactor2 >::unwhitenedError NonlinearFactor.h:395 gtsam::NoiseModelFactor::linearize NonlinearFactor.cpp:142 gtsam::NonlinearFactorGraph::linearize NonlinearFactorGraph.cpp:336 gtsam::UpdateImpl::linearizeNewFactors ISAM2-impl.h:475 gtsam::ISAM2::update ISAM2.cpp:463 gtsam::ISAM2::update ISAM2.cpp:410 omnimapper::OmniMapperBase::commitNextPoseNode omnimapper_base.cpp:232 omnimapper::OmniMapperBase::spinOnce omnimapper_base.cpp:654 omnimapper::OmniMapperBase::spin omnimapper_base.cpp:640 boost::_mfi::mf0::operator() mem_fn_template.hpp:49 boost::_bi::list1 >::operator(), boost::_bi::list0> bind.hpp:259 boost::_bi::bind_t, boost::_bi::list1 > >::operator() bind.hpp:1294 boost::detail::thread_data, boost::_bi::list1 > > >::run thread.hpp:116 0x00007f3b96122bcd start_thread 0x00007f3b99ac46db clone 0x00007f3b958a088f ```

As well as a dump of the variables in scope:

``` Signal = SIGABRT (Aborted) pointer = {const omnimapper::BoundedPlane3 * const | 0x7f3b35edf880} 0x7f3b35edf880 gtsam::DerivedValue > = {gtsam::DerivedValue} gtsam::Value = {gtsam::Value} n_ = {gtsam::Unit3} p_ = {gtsam::Vector3} B_ = {boost::optional} H_B_ = {boost::optional} d_ = {double} 0.27529796957969666 boundary_ = {omnimapper::BoundedPlane3::CloudPtr} plane_mutex_ = {boost::shared_ptr} this = {gtsam::internal::handle * const | 0x7f3b517a3897} 0x7f3b517a3897 j = {gtsam::Key} 8070450532247928832 ```
ProfFan commented 4 years ago

@ruffsl Right at line 279:

std::cout << "Wanted: " << typeid(const GenericValue<ValueType>&).name()
        << ", " << typeid(const GenericValue<ValueType>&).hash_code() << std::endl;
std::cout << "Actual: " << typeid(*pointer).name() << ", " << typeid(*pointer).hash_code() << std::endl;
ruffsl commented 4 years ago

Looks like gtsam is prepending N5gtsam12GenericValueI to the type name:

...
BoundedPlanePlugin: Creating new plane 8070450532247928832: 0.532448 0.165157 -0.830194 0.466272
Adding new symbol: p0
BoundedPlanePlugin: Added factor!
Wanted: gtsam::GenericValue<gtsam::Pose3>, 16382625216385153739
Actual: gtsam::GenericValue<gtsam::Pose3>, 16382625216385153739
Wanted: gtsam::GenericValue<omnimapper::BoundedPlane3<pcl::PointXYZRGBA> >, 1799629776660340886
Actual: omnimapper::BoundedPlane3<pcl::PointXYZRGBA>, 9401296165788534773
OmniMapperRos: Got cloud from: 2020-Mar-26 08:05:10.858989
cloudCallback: conversion took 0.035933
terminate called after throwing an instance of 'std::bad_cast'
  what():  std::bad_cast
Signal: SIGABRT (Aborted)
ProfFan commented 4 years ago

This is Correct I think? It appears that you are inheriting Value directly instead of using traits?

varunagrawal commented 4 years ago

Hmm this isn't a segfault, but it is a valid issue since the types should be identical. Thanks for the data to reproduce this @ruffsl. We'll take a look locally this week and see what the best way to fix this is.

ruffsl commented 4 years ago

Thanks for taking a look at this, though we may have worked around the original issue by migrating to traits, as the commit mentioned here demonstrates: https://github.com/CogRob/omnimapper/issues/26#issuecomment-604806403

I couldn't find a migration example of DerivedValue to traits, so please see if this uses traits correctly: https://github.com/CogRob/omnimapper/commit/6834d48e672d4afb9731c1df0084e497b3b7bda5

dellaert commented 4 years ago

Woohoo, traits! That's the way to go :-)

varunagrawal commented 4 years ago

@ruffsl I have some comments which you may find useful

Hope you find this helpful. If you think this issue has been resolved given all the comments so far, we can go ahead and close this issue.

dellaert commented 4 years ago

@varunagrawal planes are not Lie groups, so Manifold is exactly right in this case :-)

ProfFan commented 4 years ago

For any future visitors wanting to do a migration from GTSAM 3 to 4, here is the commit @ruffsl mentioned: https://github.com/CogRob/omnimapper/commit/6834d48e672d4afb9731c1df0084e497b3b7bda5

BTW, The original issue is still there: we are allowing a non-conforming value to be added to Values.

ProfFan commented 4 years ago

All right it appears this comes from here:

  /* ************************************************************************* */
  void Values::insert(Key j, const Value& val) {
    std::pair<iterator,bool> insertResult = tryInsert(j, val);
    if(!insertResult.second)
      throw ValuesKeyAlreadyExists(j);
  }

If you inherit from Value directly then the templated-based insert that wraps the value with a GenericValue (in Values-inl.h) will not be called. We should have some templated checks on this function I think.

dellaert commented 4 years ago

We should probably "untuck" this migration guide from BB. And any otehr docs that did not make the transition.

varunagrawal commented 4 years ago

The guides should "technically" go on the gtsam.org website for easy reading and discovery.

varunagrawal commented 4 years ago

@ProfFan @dellaert is this the issue you mentioned in #313?

ProfFan commented 4 years ago

Yes (and no)

ProfFan commented 4 years ago

This is not the #313 problem, #313 is when you have exactly the same type and this is a wrong usage of GTSAM by not wrapping with GenericValue but directly inheriting from Value.

TheSeanParker commented 5 months ago

①typedef Eigen::Matrix<double,-1,1> VectorXd ②typedef Matrix< Scalar,3,1> Vector3 this bug reason I find① isnot compatible with②,so will error https://zhuanlan.zhihu.com/p/704661771