Open ruffsl opened 4 years ago
Could be the RTTI demon being summoned again... Anyway it is better to NOT use TBB if you want to debug something.
Could you do the following:
Add this to Line 161:
std::cout << "Wanted: " << typeid(old_value).name()
<< ", " << typeid(old_value).hash_code() << std::endl;
std::cout << "Actual: " << typeid(val).name() << ", " << typeid(val).hash_code() << std::endl;
And compile, run, post the result and your compiler (GCC/Clang) version.
It is really appreciated that you also post the GLibC version and your OS version :)
Add this to Line 161:
I had some issues flushing to stdout, so I used the message around line 233 like so:
/* ************************************************************************* */
const char* ValuesIncorrectType::what() const throw() {
if(message_.empty())
{
message_ =
"Attempting to retrieve value with key \"" + DefaultKeyFormatter(key_) + "\", type stored in Values is " +
std::string(storedTypeId_.name()) + " but requested type was " + std::string(requestedTypeId_.name());
message_ +=
"\nWanted: " + std::string(storedTypeId_.name()) + ", " + string_format("%zx", storedTypeId_.hash_code());
message_ +=
"\nActual: " + std::string(requestedTypeId_.name()) + ", " + string_format("%zx", requestedTypeId_.hash_code());
}
return message_.c_str();
}
// string_format() from https://stackoverflow.com/a/26221725/2577586
It looks like hash_code returns the same (hex) values:
[omnimapper_ros_node-1] BoundedPlanePlugin: Creating new plane 8070450532247928832: -0.105721 0.939853 -0.324807 1.457075
[omnimapper_ros_node-1] Adding new symbol: p0
[omnimapper_ros_node-1] terminate called after throwing an instance of 'gtsam::ValuesIncorrectType'
[omnimapper_ros_node-1] what(): Attempting to retrieve value with key "p0", type stored in Values is N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE but requested type was N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE
[omnimapper_ros_node-1] Wanted: N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE, 82781d0d761147f5
[omnimapper_ros_node-1] Actual: N10omnimapper13BoundedPlane3IN3pcl12PointXYZRGBAEEE, 82781d0d761147f5
[ERROR] [omnimapper_ros_node-1]: process has died [pid 17776, exit code -6, cmd '/
post the result and your compiler (GCC/Clang) version. It is really appreciated that you also post the GLibC version and your OS version :)
See update under environment in top post.
Note this is incorrect place to add the printouts. You should add these in Values-inl.h
because these type conversions happen there I think.
You should add these in Values-inl.h because these type conversions happen there I think.
Do you mean here:
It looks like ValuesIncorrectType
is thrown using the same type info as my diff above would print.
Could you post an exact git patch you'd like me to apply to Values-inl.h
and test on my end?
By just commenting out the try and catch around the dynamic_cast
, I captured the segfault directly:
As well as a dump of the variables in scope:
@ruffsl Right at line 279:
std::cout << "Wanted: " << typeid(const GenericValue<ValueType>&).name()
<< ", " << typeid(const GenericValue<ValueType>&).hash_code() << std::endl;
std::cout << "Actual: " << typeid(*pointer).name() << ", " << typeid(*pointer).hash_code() << std::endl;
Looks like gtsam is prepending N5gtsam12GenericValueI
to the type name:
...
BoundedPlanePlugin: Creating new plane 8070450532247928832: 0.532448 0.165157 -0.830194 0.466272
Adding new symbol: p0
BoundedPlanePlugin: Added factor!
Wanted: gtsam::GenericValue<gtsam::Pose3>, 16382625216385153739
Actual: gtsam::GenericValue<gtsam::Pose3>, 16382625216385153739
Wanted: gtsam::GenericValue<omnimapper::BoundedPlane3<pcl::PointXYZRGBA> >, 1799629776660340886
Actual: omnimapper::BoundedPlane3<pcl::PointXYZRGBA>, 9401296165788534773
OmniMapperRos: Got cloud from: 2020-Mar-26 08:05:10.858989
cloudCallback: conversion took 0.035933
terminate called after throwing an instance of 'std::bad_cast'
what(): std::bad_cast
Signal: SIGABRT (Aborted)
This is Correct I think? It appears that you are inheriting Value
directly instead of using traits?
Hmm this isn't a segfault, but it is a valid issue since the types should be identical. Thanks for the data to reproduce this @ruffsl. We'll take a look locally this week and see what the best way to fix this is.
Thanks for taking a look at this, though we may have worked around the original issue by migrating to traits, as the commit mentioned here demonstrates: https://github.com/CogRob/omnimapper/issues/26#issuecomment-604806403
I couldn't find a migration example of DerivedValue
to traits, so please see if this uses traits correctly:
https://github.com/CogRob/omnimapper/commit/6834d48e672d4afb9731c1df0084e497b3b7bda5
Woohoo, traits! That's the way to go :-)
@ruffsl I have some comments which you may find useful
Your commit for using traits does the bare minimum (which is great), though I don't believe it allows BoundedPlane
to be optimized as a Lie Group (unless this is not what you want). Take a look at the LieGroup class to see an example of how traits are used for the Rot2
and Rot3
classes. I am sorry that it is not an example of migrating from DerivedValue
to traits
, but it should serve you well.
You are right that GTSAM does not pick up TBB in Debug mode, though this is actually not GTSAM's fault. :slightly_smiling_face: The issue is that all packages of TBB (apt, pacman, you name it) use the default build from oneTBB
or from Intel's website, and these builds only include release versions of tbb
and tbbmalloc
.so files, so when GTSAM tries to look for TBB in debug mode, it tries looking for tbb_debug.so
and tbbmalloc_debug.so
which do not exist in the installed package, and thus TBB is treated as not found. The fix to this is to install TBB from the oneTBB
repo manually. This is fairly simple, you just need to download the correct version of TBB from oneTBB
's releases page on their repo and run the shell script as source tbbvars.sh intel64 linux auto_tbbroot
(assuming you are using Bash, which is required, an intel based 64 bit chip on linux). This sets the necessary env paths and the releases include the debug builds so GTSAM should pick it up without issues. You can go another step further and actually copy the debug builds to the same location as the release builds (the locate
command is your friend for this one).
Hope you find this helpful. If you think this issue has been resolved given all the comments so far, we can go ahead and close this issue.
@varunagrawal planes are not Lie groups, so Manifold is exactly right in this case :-)
For any future visitors wanting to do a migration from GTSAM 3 to 4, here is the commit @ruffsl mentioned: https://github.com/CogRob/omnimapper/commit/6834d48e672d4afb9731c1df0084e497b3b7bda5
BTW, The original issue is still there: we are allowing a non-conforming value to be added to Values.
All right it appears this comes from here:
/* ************************************************************************* */
void Values::insert(Key j, const Value& val) {
std::pair<iterator,bool> insertResult = tryInsert(j, val);
if(!insertResult.second)
throw ValuesKeyAlreadyExists(j);
}
If you inherit from Value
directly then the templated-based insert
that wraps the value with a GenericValue
(in Values-inl.h
) will not be called. We should have some templated checks on this function I think.
We should probably "untuck" this migration guide from BB. And any otehr docs that did not make the transition.
The guides should "technically" go on the gtsam.org website for easy reading and discovery.
@ProfFan @dellaert is this the issue you mentioned in #313?
Yes (and no)
This is not the #313 problem, #313 is when you have exactly the same type and this is a wrong usage of GTSAM by not wrapping with GenericValue
but directly inheriting from Value
.
①typedef Eigen::Matrix<double,-1,1> VectorXd ②typedef Matrix< Scalar,3,1> Vector3 this bug reason I find① isnot compatible with②,so will error https://zhuanlan.zhihu.com/p/704661771
Description
As originally tracked in https://github.com/CogRob/omnimapper/issues/26 , it seems that we're encountering segfaults when a
ValuesIncorrectType
error is thrown even whenname
valudes fortype_info
s are the same.Steps to reproduce
See run instructions from docker demo described in the project readme: https://github.com/CogRob/omnimapper_ros/blob/plugin/planes/README.md
A minimal ros2bag that can induce the segfault can be found here: https://drive.google.com/file/d/1cVxsfCp2LSuRGkaUUA-bOC2fhhts1Cyo
Stacktrace:
Relevant code path:
https://github.com/borglab/gtsam/blob/f538d1dc7bdd7126cb683f3e961c985f76a872b0/gtsam/nonlinear/Values.cpp#L160-L163
https://github.com/borglab/gtsam/blob/16dbf27375fdefb83ce2355beb0c147fa9c07600/gtsam/nonlinear/Values.cpp#L233-L240
Expected behavior
When calling update for new factors with the same type, a
ValuesIncorrectType
error is not expected.https://github.com/CogRob/omnimapper/blob/f215e35a134c54b9acfebe247bd9e42e5e3f033b/src/omnimapper_base.cpp#L467
Environment
OS: Ubuntu 18.04 Lang: C++ 14
Exact checkout of gtsam: https://github.com/CogRob/omnimapper_ros/blob/a17c4017dfde4cb022384e8d8f521e2ab05197fb/install/underlay/underlay.repos#L2-L5
Exact build setup https://github.com/CogRob/omnimapper_ros/blob/a17c4017dfde4cb022384e8d8f521e2ab05197fb/Dockerfile#L55-L66
Additional information
I'd like to get a better stack trace, but there may be a separate issue in using TBB downstream when compiling from gtsam debug builds. Perhaps gtsam skips finding TBB when debug flags are set? See: https://github.com/CogRob/omnimapper/issues/25