Closed TrophyBuck closed 7 years ago
This seems to me like bogus data from your simulation.
Please provide your full configuration in a fork of cartographer_ros and a bag file with sample data so that we can have a look.
I actually forked cartographer_turtlebot since that's what I've changed, link here. I've updated the bug_demo branch in it to showcase the bug- though the bag file was too large for GitHub to accept, so I've put it here on Google Drive. When downloaded and setup, run the following to run the bag file alongside the cartographer_turtlebot launch.
roslaunch cartographer_turtlebot turtlebot_lidar_2d_demo.launch bag_filename:=${HOME}/Downloads/failure_demo.bag
Error should occur at about 1005.49 time with this setup- it's consistent here, though if I were to relaunch all the nodes manually instead of using the bag file, it would probably happen at a different time.
I see this error often when using odometry data. I've verified that my odometry data never contains NaN. I just have the two inputs: odometry and a 2D laser.
This is a bug in SuiteSparse v4.4.6. Modify the Ceres CMakeLists.txt file to use EigenSparse instead of SuiteSparse and recompile.
@BrannonKing Thanks for finding the root cause!
I should add that if you have a NaN in one of ROS's broadcasted transforms, you may get this error, but you will get a more informative error using EigenSparse. I would like to know if anyone is able to use SuiteSparse v4.5.5 to make this error go away.
Thanks for the replies- I tried setting the CMakeLists.txt to use EigenSparse, but when I did I got the following error during runtime:
E0419 13:57:37.598997 8287 covariance_impl.cc:548] SuiteSparse is required to use the SUITE_SPARSE_QR algorithm.
[ERROR] [1492624657.599212015, 26.529000000]: E0419 13:57:37.000000 8287 covariance_impl.cc:548] SuiteSparse is required to use the SUITE_SPARSE_QR algorithm.
F0419 13:57:37.599295 8287 ceres_scan_matcher.cc:108] Check failed: covariance_computer.Compute(covariance_blocks, &problem)
[FATAL] [1492624657.599386618, 26.529000000]: F0419 13:57:37.000000 8287 ceres_scan_matcher.cc:108] Check failed: covariance_computer.Compute(covariance_blocks, &problem)
The code I added to the CMake file is below- I added it after the options setup.
update_cache_variable(SUITESPARSE OFF)
update_cache_variable(EIGENSPARSE ON)
I'm guessing I'm setting it up incorrectly- it's worth noting that I am getting these confirmations during building that Eigen is enabled and Suite is disabled.
-- Found Eigen version 3.2.92: /usr/include/eigen3
===============================================================
Enabling the use of Eigen as a sparse linear algebra library
for solving the nonlinear least squares problems. Enabling
this option results in an LGPL licensed version of
Ceres Solver as the Simplicial Cholesky factorization in Eigen
is licensed under the LGPL.
===============================================================
-- Building without SuiteSparse.
When I did it, I didn't add any rows to CMakeLists.txt; I modified the two existing lines for those variables. Also, I had to clear out my build folder as the change wasn't detected sufficiently to cause a rerun of CMake.
Do you mean the option lines when you say you modified the existing lines? I tried changing those, but when I double checked the values by adding a message to the CMake it showed that Eigen was still disabled and Suite was still enabled.
message("EIGEN ${EIGENSPARSE} SUITE ${SUITESPARSE}") #added
I just tried deleting the old build folder and rebuilding but got some different errors relating back to Residual Blocks, so I think I'll try changing whatever existing lines you changed and rebuild.
To clarify, i mean these when I say 'option lines'
option(SUITESPARSE "Enable SuiteSparse." OFF)
OPTION(EIGENSPARSE "Enable Eigen as a sparse linear algebra library, WARNING: results in an LGPL licensed Ceres." ON)
After changing the options lines, cartographer doesn't crash but it does stop producing a submap, showing the error below that looks similar to the error this issue started with. This error is repeated once it shows up, and the submap seems to stop publishing.
Error: TF_NAN_INPUT: Ignoring transform for child_frame_id "turtlebot_tf/odom" from authority "unknown_publisher" because of a nan value in the transform (nan nan nan) (0.000000 0.000000 0.001341 0.999999)
at line 240 in /tmp/binarydeb/ros-kinetic-tf2-0.5.13/src/buffer_core.cpp
[ WARN] [1492628146.895639256, 259.038000000]: W0419 14:55:46.000000 29344 residual_block.cc:131]
Error in evaluating the ResidualBlock.
There are two possible reasons. Either the CostFunction did not evaluate and fill all
residual and jacobians that were requested or there was a non-finite value (nan/infinite)
generated during the or jacobian computation.
Residual Block size: 1 parameter blocks x 202 residuals
For each parameter block, the value of the parameters are printed in the first column
and the value of the jacobian under the corresponding residual. If a ParameterBlock was
held constant then the corresponding jacobian is printed as 'Not Computed'. If an entry
of the Jacobian/residual array was requested but was not written to by user code, it is
indicated by 'Uninitialized'. This is an error. Residuals or Jacobian values evaluating
to Inf or NaN is also an error.
This error is followed by the same Residual and Parameter Block matrices that are full of "nan"s.
I too have seen the (nan, nan, nan)
in the transform today. I was too hasty in declaring it the fault of SuiteSparse as I didn't see it for a day after I ditched that. Let's reopen this bug. At this point I'm not sure if the bug lies with Cartographer or Cartographer_ros. It definitely seems related to the odometry. The higher-quality the odometry the less likely one is to see this.
This is a bug in SuiteSparse v4.4.6. Modify the Ceres CMakeLists.txt file to use EigenSparse instead of SuiteSparse and recompile.
You can do that without changing CMakeLists.txt by just calling CMake/catkin with additional CMake arguments, e.g. -DEIGENSPARSE=True -DSUITESPARSE=False
.
But, IIRC, it did not work well for me, and I haven't seen the nice loop-closing realtime submap alignments when using EigenSparse.
The background of this is that earlier this year, I was wondering why I could not get loop closing to work (googlecartographer/cartographer_ros#247). It turned out that I was missing SuiteSparse, and Ceres was built without any sparse linear algebra library. To resolve this, in #189, I added that Cartographer requires Ceres built with any sparse linear algebra library (either EigenSparse or SuiteSparse or CXSparse). IIRC, one of the things I did try was -DEIGENSPARSE
, and I think that it didn't work well. Only when I installed SuiteSparse did Cartographer start closing loops properly.
If it seems that Cartographer really requires SuiteSparse to work properly, maybe the requirement in Cartographer's CMakeLists should be changed from Ceres with SparseLinearAlgebraLibrary
to Ceres with exactly SuiteSparse
? @SirVer
I don't think that it's a requirement to use SuiteSparse- the issue we're having seems to happen on both SuiteSparse and EigenSparse. I tried forcing the use of SuiteSparse in Cartographer's CMake like @ojura mentioned, and still got the original error mentioned in this issue.
@SirVer Is there any other information or testing we can provide to help solve this issue?
I am having the same issue. I've linked my fork and rosbag below. I tried building with the arguments ojura described above. Only difference it produced is instead of crashing right away and not producing any maps, it would produce a handful of messages, stop publishing and repeatedly produce the nan residual block warning plus a TF_NAN_INPUT error, both of which I've copied below. Let me know if you want additional data. I am using indigo on Ubuntu 14.04
Error: TF_NAN_INPUT: Ignoring transform for child_frame_id "base_link" from authority "unknown_publisher" because of a nan value in the transform (-nan -nan 0.000000) (0.000000 0.000000 0.002974 0.999996)
[ WARN] [1493150351.421769307, 1492802830.018635481]: W0425 12:59:11.000000 9558 residual_block.cc:131]
Error in evaluating the ResidualBlock.
There are two possible reasons. Either the CostFunction did not evaluate and fill all
residual and jacobians that were requested or there was a non-finite value (nan/infinite)
generated during the or jacobian computation.
Residual Block size: 1 parameter blocks x 61 residuals
For each parameter block, the value of the parameters are printed in the first column
and the value of the jacobian under the corresponding residual. If a ParameterBlock was
held constant then the corresponding jacobian is printed as 'Not Computed'. If an entry
of the Jacobian/residual array was requested but was not written to by user code, it is
indicated by 'Uninitialized'. This is an error. Residuals or Jacobian values evaluating
to Inf or NaN is also an error.
Residuals: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
Parameter Block 0, size: 3
-nan | nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
-nan | nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
0.00594778 | nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
Build with instructions specified in the Cartographer ROS Docs Run with: roslaunch cartographer_ros scanse_baxter_bag.launch bag_filename:=${BAG_DIR} https://github.com/Cranapple/cartographer_ros https://drive.google.com/file/d/0BwfwZjTRXXFdbFJhbzVROXNfQ00/view?usp=sharing
@TrophyBuck I looked into your turtlebot example and was able to repro the crash.
Things I found:
inf
which led to the crash. I fixed this in a PR.@Cranapple I did not look into your case. Could you check if my fix helps you too and open a new bug report otherwise?
Thank you very much for tracking this down, and for the fix! In my system, it is not uncommon to send data with the same timestamp as the previous batch. On devices that don't have a system clock, the clock signal must be broadcast from a different device. Previous designs have simply published a clock signal periodically with all receivers using the most-recently-published time value. I'm working with the ROS2 design to remedy this; there is no need to use the most recent value as every platform has good timers, even if they don't have a clock chip. See https://discourse.ros.org/t/of-clocks-and-simulation-betimes-and-otherwise/1587
In my system, it is not uncommon to send data with the same timestamp as the previous batch.
This sounds dangerous to me. In our experience better timing translates directly to better SLAM quality. Yolo timing means that you need to be lenient in your SLAM expetations - i.e. increase the resolution, expect more drift and so on. Of course, timing is also a very hard problem.
@SirVer Thanks, that seems to have fixed it! I'm not sure how I ended up sending two LaserScans with the same time stamp, but in my testing with the new fix Cartographer maps the environment very well.
Started using the cartographer_turtlebot package in simulation, and I've been running into this error. It occurs after a random amount of time ranging from a few seconds after starting the cartographer_node to several minutes- the only changes I've made have been remapping a few topic names (mainly odometry and the laser scanner), removing a line launching turtlebot's minimal bringup and a line launching the urg_node (since I'm using my own version of turtlebot) and disabling the imu (changes have been across this file and this file). The file I've been launching from is here. The error I've been getting is shown below:
It's also worth noting that this error occurs regardless of if the robot stands still or moves- and while moving, does correctly generate a map. I've tested running the 2D Lidar Demo (listed here) with imu disabled and no other changes, in order to make sure the disabled imu wasn't causing the issue. It ran without error under those circumstances. I'm using Ubuntu 16.04 and ROS Kinetic.