Closed 27Apoorva closed 4 years ago
Please properly fill out the issue template.
Also provide your parameters and bag file.
@SteveMacenski Sorry about that. I updated the issue template. Hope that is better. I will get you a traceback file if my copy pasting from gdb session is not helpful. Thanks
This issue is complex, I’ll need the bag too. Please fully describe your setup and the issue. When does it happen. How often. Have you tried other maps. Etc. The template should walk you through providing that information- I just recently added it so if there’s tweeks I can make to nudge folks like yourself to fully describe their issue let me know :-)
But given no one else has reported this, I suspect it has to do with your setup, available memory, etc. so what’s this being run on? CPU/memory/static memory left. It could be that localization is trying to access a deleted scan or something’s corrupted in the serialized map itself.
When does this happen? Directly on deserialization or sometime after? How was the map that was serialized used/generated/treated?
It would be great if you could add sections in your template for:
I am using Ubuntu 16.04 in VM inside macbook. Memory - 9.4gb Processor - Intel core i7-8750H CPU @ 2.20GHz × 4 Disk - 811GB I can see the map in RViz. The deserialization doesn't give any errors. Robot moves for some time about 30 seconds maybe and then it happens. I just am giving the path to map in the config file. I cannot post the bag file online because of confidentiality issue. I haven't tried it on any other map yet. Things I will try:
If not I will email you the bag file.
Robot moves for some time about 30 seconds maybe and then it happens
Things aren't random - so something most have triggered this. Is this ~30 timeframe happen to align with the first time that the loop closure system is triggered? Or perhaps the first time that Ceres fails to converge (which is odd in your log above, I've never seen a properly setup system fail to converge)?
9.4gb
Your memory is a bit low, you sure there's actually memory available? I assume that's OK, but just checking.
Please try with another run generated from the sync or async nodes and try again. I think one of your colleagues posted a ticket earlier about generating serialized files from LifeLong mapping mode and had issues (which, don't do). Lets try to isolate this to:
I cannot post the bag file online because of confidentiality issue.
OK, then I can't help with that. I don't want to be responsible for your companies IP or data if you're unwilling to share for debug.
I used another dataset. It still has the same error. When I used another dataset, I created the posegraph using online_sync mode. and seems like graph loads fine when I opened it in rviz.
I can send the bag if you can take a look at it. Thank you. Here is the google drive link with bag since bag was quite big. https://drive.google.com/drive/folders/1Z-lzib9k2RlorWhBBa1SKL3WK7wAy18P?usp=sharing
Is this ~30 timeframe happen to align with the first time that the loop closure system is triggered? Or perhaps the first time that Ceres fails to converge (which is odd in your log above, I've never seen a properly setup system fail to converge)?
Please answer my question above. Also what happens if you disable loop closure in localization mode?
@27Apoorva any update?
last I know it still was the issue so we have dropped to work on this for now. Thank you for all your help/
Hiya - I experience the same problem, as described in #233. I think it is more beneficial to continue the discussion here. It would be great if we find a solution. Please let me know how I can contribute.
To answer some questions from above:
online_sync.launch
and the rviz pluginlocalization.launch
when the crash happens and set a map_file_name
I actually found that the crash does not happen when I set mode: localization
in the mapper_params_localization.yaml
. Not sure why it is set to mapping
by default in localization mode? I will run more tests tomorrow.
What's the crash? Get a gdb traceback please.
See read me on the mode setting. It just changes how ceres handles removing values from the graph (speed vs memory - default to mapping optimizes for memory; localization is speed)
The traceback seems to be the same as that posted by @27Apoorva; for completeness:
#0 0x00007ffff553f438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff554103a in __GI_abort () at abort.c:89
#2 0x00007ffff5b7984d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff5b776b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff5b77701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff5b77919 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff7b51508 in void boost::throw_exception<boost::lock_error>(boost::lock_error const&) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/liblocalization_slam_toolbox.so
#7 0x00007ffff7b4d1e9 in boost::mutex::lock() () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/liblocalization_slam_toolbox.so
#8 0x00007ffff7b51e7b in boost::unique_lock<boost::mutex>::lock() () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/liblocalization_slam_toolbox.so
#9 0x00007ffff7b51575 in boost::unique_lock<boost::mutex>::unique_lock(boost::mutex&) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/liblocalization_slam_toolbox.so
#10 0x00007ffff775854f in boost::shared_mutex::lock_shared() () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#11 0x00007ffff775b79b in boost::shared_lock<boost::shared_mutex>::lock() () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#12 0x00007ffff775a257 in boost::shared_lock<boost::shared_mutex>::shared_lock(boost::shared_mutex&) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#13 0x00007ffff7758b7f in karto::LocalizedRangeScan::GetPointReadings(bool) const () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#14 0x00007ffff6f9a489 in karto::ScanMatcher::FindValidPoints(karto::LocalizedRangeScan*, karto::Vector2<double> const&) const () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libkartoSlamToolbox.so
#15 0x00007ffff6f9a22e in karto::ScanMatcher::AddScan(karto::LocalizedRangeScan*, karto::Vector2<double> const&, bool) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libkartoSlamToolbox.so
#16 0x00007ffff6f9a0ca in karto::ScanMatcher::AddScans(std::vector<karto::LocalizedRangeScan*, std::allocator<karto::LocalizedRangeScan*> > const&, karto::Vector2<double>) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libkartoSlamToolbox.so
#17 0x00007ffff6fb06e5 in double karto::ScanMatcher::MatchScan<std::vector<karto::LocalizedRangeScan*, std::allocator<karto::LocalizedRangeScan*> > >(karto::LocalizedRangeScan*, std::vector<karto::LocalizedRangeScan*, std::allocator<karto::LocalizedRangeScan*> > const&, karto::Pose2&, karto::Matrix3&, bool, bool) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libkartoSlamToolbox.so
#18 0x00007ffff6fa29fa in karto::Mapper::ProcessLocalization(karto::LocalizedRangeScan*) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libkartoSlamToolbox.so
#19 0x00007ffff7b4aebc in slam_toolbox::LocalizationSlamToolbox::addScan(karto::LaserRangeFinder*, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&, karto::Pose2&) ()
from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/liblocalization_slam_toolbox.so
#20 0x00007ffff7b4ab30 in slam_toolbox::LocalizationSlamToolbox::laserCallback(boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/liblocalization_slam_toolbox.so
#21 0x00007ffff76d862e in boost::_mfi::mf1<void, slam_toolbox::SlamToolbox, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>::operator()(slam_toolbox::SlamToolbox*, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&) const
() from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#22 0x00007ffff76d21c6 in void boost::_bi::list2<boost::_bi::value<slam_toolbox::SlamToolbox*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, slam_toolbox::SlamToolbox, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>, boost::_bi::list1<boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&> >(boost::_bi::type<void>, boost::_mfi::mf1<void, slam_toolbox::SlamToolbox, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>&, boost::_bi::list1<boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>&, int) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#23 0x00007ffff76cb1ab in void boost::_bi::bind_t<void, boost::_mfi::mf1<void, slam_toolbox::SlamToolbox, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>, boost::_bi::list2<boost::_bi::value<slam_toolbox::SlamToolbox*>, boost::arg<1> > >::operator()<boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>(boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#24 0x00007ffff76c2f62 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, slam_toolbox::SlamToolbox, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>, boost::_bi::list2<boost::_bi::value<slam_toolbox::SlamToolbox*>, boost::arg<1> > >, void, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>::invoke(boost::detail::function::function_buffer&, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&) ()
from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#25 0x00007ffff76d22ef in boost::function1<void, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&>::operator()(boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&) const ()
from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#26 0x00007ffff76cb323 in boost::detail::function::void_function_obj_invoker1<boost::function<void (boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&)>, void, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> >::invoke(boost::detail::function::function_buffer&, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const>) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#27 0x00007ffff771f6c2 in boost::function1<void, boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> >::operator()(boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const>) const ()
from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#28 0x00007ffff771afb1 in message_filters::CallbackHelper1T<boost::shared_ptr<sensor_msgs::LaserScan_<std::allocator<void> > const> const&, sensor_msgs::LaserScan_<std::allocator<void> > >::call(ros::MessageEvent<sensor_msgs::LaserScan_<std::allocator<void> > const> const&, bool)
() from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#29 0x00007ffff76c9185 in message_filters::Signal1<sensor_msgs::LaserScan_<std::allocator<void> > >::call(ros::MessageEvent<sensor_msgs::LaserScan_<std::allocator<void> > const> const&) () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#30 0x00007ffff76c047d in message_filters::SimpleFilter<sensor_msgs::LaserScan_<std::allocator<void> > >::signalMessage(ros::MessageEvent<sensor_msgs::LaserScan_<std::allocator<void> > const> const&) ()
from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#31 0x00007ffff771a383 in tf2_ros::MessageFilter<sensor_msgs::LaserScan_<std::allocator<void> > >::CBQueueCallback::call() () from /home/administrator/catkin_ws/devel/.private/slam_toolbox/lib/libtoolbox_common.so
#32 0x00007ffff6805838 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/kinetic/lib/libroscpp.so
#33 0x00007ffff680723b in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/kinetic/lib/libroscpp.so
#34 0x00007ffff6863e39 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/kinetic/lib/libroscpp.so
#35 0x00007ffff6848e9b in ros::spin() () from /opt/ros/kinetic/lib/libroscpp.so
#36 0x00000000004c8943 in main ()
Looks like the same issue - its trying to access some point data for a scan that is either deleted, invalid, or for some reason not able to access the mutex. Google backs this theory up:
https://stackoverflow.com/questions/30090108/what-does-mutex-lock-fail-with-invalid-argument-mean
Is there some deterministic (or even pseudo-deterministic) way you can get this to occur? The main reason I couldn't help @27Apoorva is because I couldn't replicate. I ran over many of my datasets and this never occurred. If you have a dataset that this happens in somewhat regularly, we can look and see what's being deleted its unhappy about and track back to whatever algorithmic error there is.
Alright, here we go: https://www.dropbox.com/sh/oncqjr76857kxyv/AADuL3p1cJLE5cXKEEMRUJ-ca?dl=0
To replicate, run rosbag play --clock bags_2020-07-21-20-43-01.bag
and roslaunch slam_toolbox localization.launch
. You will need to modify the path to the serialized map in the mapper_params_localization.yaml
(note that also the base_frame
differs from the default). It seems to crash every ~5th time or so. It also seems to crash more often if I launch the slam_toolbox a couple of seconds after the rosbag play. I also attach another bag, bags_2020-07-21-20-57-42.bag, which is longer and I also observed the crash.
I hope this helps - let me know if there is something else I can provide.
As before, setting just had one ...mode: localization
leads to fewer/no crashes (I haven't observed any yet in that mode)
OK, this is on my queue to look at (I'm busy with something else but its item #3 on my queue). How long is the bag that fails every 5th time? I don't want to sit around for 15 hours waiting for a crash and spamming my console with debug info.
My guess from what we're looking at is one of the two following:
Either way, if you want to do some debugging in the meantime, more power to you. The more we can isolate where this is happening, the better. Maybe try running it without the scan matcher (Ceres) plugin so there's no loop closure. If it goes away, that narrows down to that subsystem. If it doesn't, then we can at least rule that out (I'm hoping its something Ceres related, that subsystem is far more easy to debug than the graph side of things). LocalizedRangeScan::GetPointReadings
is the function where your tracebacks tell me the failure is happening when it tries to lock the mutex while trying to AddScan
.
Hi,
If I remember correctly, the reason this crash was happening had something to do with ceres
and running kinetic
branch. When I switched to melodic
on Ubuntu 16.04, the crash stopped happening.
I am currently not working on slam toolbox actively but this is what I remember on top of my head.
Ah, I don't support Kinetic branch anymore (its a wasteland without structure from prototyping as a research hobby project). It wouldn't surprise me at all if there were serious issues in the Kinetic branch. @Tobias-Fischer are you also using the kinetic branch? This could be an easy fix (and maybe I just delete the kinetic branch so that no one else runs into this) :-)
Hiya,
The bag files are ~30 seconds (crashing every 5th time that one) and ~2 minutes respectively, so not too long to sit around. I will run it without Ceres and let you know. The error happens both on the melodic
and noetic
branches. The rosbag is recorded on ROS Kinetic though (unfortunately the robot still runs Kinetic), but that should not matter.
I agree with your suggestion to delete the kinetic branch. The noetic branch compiles fine on kinetic/melodic, so IMO there is not even a need for the melodic branch.
Many thanks for looking into this!
So the crash happens even after setting use_scan_matching: false
and do_loop_closing: false
. It crashes quite frequently now actually which should be good for debugging.
Seems like I found the culprit: In https://github.com/SteveMacenski/slam_toolbox/blob/noetic-devel/slam_toolbox/lib/karto_sdk/src/Mapper.cpp#L2860-L2875 the variable oldLSV.vertex
is a nullptr
. The assignment in https://github.com/SteveMacenski/slam_toolbox/blob/noetic-devel/slam_toolbox/lib/karto_sdk/src/Mapper.cpp#L2879 is executed even if scan_vertex
is a nullptr
.
Not sure what the best fix for this is, but I hope you know what to do from there @SteveMacenski.
Many thanks!
oldLSV
is from another place from scan_vertex
. scan_vertex
is from the current scan's processing that's replacing the old one in the front of the buffer (oldLSV
). The only way scan_vertex
can be nullptr is if https://github.com/SteveMacenski/slam_toolbox/blob/noetic-devel/slam_toolbox/lib/karto_sdk/src/Mapper.cpp#L1388 this function returns nullptr which I don't see how that could happen. The assert should make things crash if pscan isn't valid. Then once you get into the if
statement, you'll return a real-allocated pointer.
You could test if I'm wrong though easily by adding a breakpoint or print statement on L1402 and see if that for some reason is at some point triggered.
It doesn't surprise me at all though that that part of the code is probably where the issue lies. I did spend a bunch of time white boarding this and testing to make sure it all worked out OK but given that was all more 'art' than 'engineering', I recognize that's probably where the mistake is.
I finished one of my tasks, this is now next up on my task list.
So if you set useScanMatching to false, then AddVertex
is never called, so scan_vertex
is always a nullptr
. This explains why it crashes more often (always) without scan matching enabled.
Also, I think there is a missing break
after https://github.com/SteveMacenski/slam_toolbox/blob/noetic-devel/slam_toolbox/lib/karto_sdk/src/Mapper.cpp#L2917 which could potentially lead to the same edge being removed multiple times and yet another nullptr.
I don't think there should be a break. A vertex can have many edges connecting to it from loop closures. Just because you found 1 edge doesn't mean there's not another or many others. The found is to make sure you found at minimum one because you know in fact there is at least one. You couldn't remove the same edge multiple times because it would look different. The source and target would be switched relative to both edge connected nodes. Other nodes don't have that same edge info from that single node so there's no risk of duplication there. If we really did delete the same edge multiple times, L2915's delete
would be what's crashing, and crashing super regularly.
Agree? Or am I missing something potentially?
That is true about what you say about turning off scan matching. Frankly it would never occur to me someone would ever turn that off (you're not really using SLAM or localization anymore at that point)
Agreed re the break
, haven't properly thought that true.
Also agreed re the scan matching, I just did that because you suggested it. Maybe that option should not be exposed ..
Maybe that option should not be exposed ..
I don't disagree
So I played with it a little but didn't very far. I thought that maybe the shared lock / unique lock stuff might be the culprit (race condition between the unlock of the shared lock and getting the unique lock) and played with upgrade lock / upgrade to unique lock as replacements, but that led either to deadlocks or crashes.
Did you have any luck?
I haven't looked, like I said, I have other things first in my queue. I hope to be able start looking myself midway through next week.
Hi, I finished by last task, this is next up on deck to start looking at tomorrow
That's great, thanks!
@Tobias-Fischer While I can't reproduce locally (tons of TF errors in Noetic that aren't related just spam me endlessly and I dont actually see any transformation ?) I think I know what's happening.
ProcessLocalization
calls MatchScan
using the current scan (probably OK) and a set of running scans in the scan buffer GetRunningScans
. The running scans are a list of the most recent N scans in the buffer of size parameter set. These running scans are then given to AddScans
with the current scan's position. Given the thing failing is accessing a scan, we can now safely ignore the current scan of being a problem since that's not even in the AddScans
input. So AddScans
does:
const_forEach(LocalizedRangeScanVector, &rScans)
{
if (*iter == NULL)
{
continue;
}
AddScan(*iter, viewPoint);
}
The AddScan
part is the next step in the traceback. So each scan is iterated over. What's odd here is that you can see I intentionall added a check to see if it was an invalid scan with the NULL
check before processing. So there's 3 options:
We then call the FindValidPoints
function and this is where we call GetPointReadings
in Karto.h:5686. After that we have the mutex and its crash-central.
Now some good questions that I don't run into often: if you've deleted an object but not set it to null, if you access it (e.g. pScan->GetPointReadings()
as in FindValidPoints
), what happens? Immediate crash? tries to do stuff until it accesses a member variable which dies (but member functions OK)? If its the former, then the issue isn't that it was already deleted because we see in the traceback it gets into member functions. If its the latter, then it probably was because that mutex is the very first member object it tries to access.
So a couple things I might try:
In Karto.h:5688 (in the noetic branch, should be same for melodic) where we have the boost lock, add a line beforehand to do some erroneous check on the m_IsDirty
parameter and see if it crashes at that line now and not the mutex. If so, that means just accessing any member fails and therefore the mutex isn't special. Or add a new parameter so that its the only place its used so the crash couldn't be mistaken with an un-locked member variable being corrupted from multi-threaded access.
In Mapper.cpp:1075 where we do the pScan->GetPointReadings()
, maybe do another nullptr check - though I don't think that's likely to help
What might be a problem?
scan_buffer_size
parameter, so maybe make the localization length -1 of that or directly just using the running scan buffer? I'll link to a branch shortly where I go over all the deletes I think might be problematic and add null afterwards if missing and you can see if that fixes it for you since I can't seem to trigger it
tl;dr: the problem is in the running scans - somehow there's 1 in the running scan buffer that's either been deleted, elements of it have been deleted, or something else odd like deleted but not nulled out. It does not have to do with the current scan at all. Given we only see this in localization mode, I suspect is that we delete a scan that's still in the rolling scan buffer from the m_LocalizationScanVertices
buffer.
~Or I suppose the opposite in AddRunningScan
where it erases a scan that the m_LocalizationScanVertices
is still storing when its distance is saying to. I'd have to think about it more, but that doesn't seem as likely.~
while (m_RunningScans.size() > m_RunningBufferMaximumSize ||
squaredDistance > math::Square(m_RunningBufferMaximumDistance) - KT_TOLERANCE)
{
// remove front of running scans
m_RunningScans.erase(m_RunningScans.begin());
// recompute stats of running scans
frontScanPose = m_RunningScans.front()->GetSensorPose();
backScanPose = m_RunningScans.back()->GetSensorPose();
squaredDistance = frontScanPose.GetPosition().SquaredDistance(backScanPose.GetPosition());
}
edit: probably not this. If the running scan stuff deleted a scan, then we wouldnt see a crash trying to access it because that list would be up to date. We would have seen the crash in the m_LocalizationScanVertices part of the code if this was the case. Disregard.
Another thing we could do to debug is to print the unique IDs of the laser scans every time we do some operation and see who is trying to access or delete a scan. That would tell us what scan is causing the crash and then who's the last to play with it and what they did. Since the bag is only 30 seconds long it shouldn't be too bad
Branch: https://github.com/SteveMacenski/slam_toolbox/tree/180 - I just did a -l
on the bag file and I'll leave it running for an hour on this branch and see if I can get it to crash after... ~120 runs. Are you saying you see the crash after 5 runs when you re-launch Slam Toolbox each time or just playing the bag 5 times over the same running localization session?
Please test the branch and get back to me with an update and we can take it from there.
I haven't observed this crash anymore in the 180 branch. Thanks a lot, I highly appreciate your efforts in this project!
Really? That easy.... OK....@27Apoorva can you confirm? If I were to venture from that analysis, it was https://github.com/SteveMacenski/slam_toolbox/pull/246/files#diff-82245f7e389553498ddb2d01272b8b0aR5330 that fixed it. So the object was deleted but the readings didn't get set to null
@Tobias-Fischer are you sufficiently happy with that if I merged it in to close out this issue?
Whoops, merge closed this - waiting for confirmation that we're all done here. FYI - see a bunch of PRs I just added, if you weren't building with release builds, this will give you about a 50% speed up
Closing from merged PRs. @Tobias-Fischer @27Apoorva let me know if this comes back in any form. Crashing is no beuno, thanks for the help debugging and isolating the problem. That made my life much easier to find the issue.
Unfortunately this still happens - much more rarely though, rare enough for me not to get a stacktrace :(.
Error is still the same:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >'
what(): boost: mutex lock failed in pthread_mutex_lock: Invalid argument
If you can get a new traceback, please file a new ticket and I can take a look. Might be more things I need to add nullptr to, but that surprises me a little. I'd have to see specifically where this one is being called from since its likely going to be different than before.
Required Info:
Steps to reproduce issue
Update the mapper_params_localization.yaml with map file and map pose and change mode to localization. Play the rosbag with tf, lidar scan, odometry. Config file:
Expected behavior
Output should be localized pose
Actual behavior
Crash
Additional information
Traceback file: