Open 67bug opened 1 year ago
What is Waiting for topics ...
? I don't see that in the codebase.
Today, i tried re-running (as I was debugging something else in my side of the code) and i see no loop closures happening -- same bag file, same settings. And obviously the map is not good at all. What could be causing this?
Was the difference that you were continuing a session or using a bag file vs running on live data? I don't know why you would see an entire run that would not generate loop closures when it did beforehand. I'm trying to understand if anything about your situation is different.
"My map isn't exactly the same" I could understand, but just "no loop closures happen... ever" is really odd. But everytime you see those Solver Summary (v 1.13.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-openmp)
Ceres prints, that is a loop closure process occurring. Ceres is only called when we've found a loop, set the constraint, and ask Ceres to reoptimize. So if you see that, loop closure is happening, but if your map isn't updating to reflect that, that's odd and makes me think something's up with Ceres or the use of Ceres.
So perhaps your custom install of Ceres could be to blame if it has a bug or for some reason isn't compatible?
Thanks for the note, @SteveMacenski
Forgive me for the lack of clarity. The Waiting for Topics
is a mesage from the laserscan_multi_merger
node that is combining scan_1
and scan_2
to generate scan_merged
Was the difference that you were continuing a session or using a bag file vs running on live data? I don't know why you would see an entire run that would not generate loop closures when it did beforehand. I'm trying to understand if anything about your situation is different.
I run like such:
rosbag record
offline.launch
which allows me to play with different parameters for loop closure (with the intent of understanding how these impact the map)mappers_params_offline.yaml
It is indeed odd that I had loop closures in one run and did not see any in any run since.
Might I ask this: When running offline, if the processing queue gets large, and of course keeps increasing because the CPU is unable to keep up, will all the messages in the queue eventually get processed? Or will the processing stop when the rosbag play
stops? Why do i ask this?
When i run rosbag play without any rate control, the processing queue gets filled and slam_toolbox lags the rosbag play (this is expected). But what i observe is that the bag finishes running and slam_toolbox also stops updating the map which caught me by surprise (may be my observation is not accurate, so wanted to get your view on this)
Hence, I started using the rate option --rate 0.5
which allow the cpu to catch up to publishes. What i am unsure of is if this rate factor is dropping messages that cause the "close enough" pose to get published to ensure that the loop closure is triggered. I will experiment with this.
Now to address the question listed above about playing with rate control and slam_toolbox stopping processing, I will try using the --queue
option for rosbag play and the --keep-alive
option.
Thoughts?
So perhaps your custom install of Ceres could be to blame if it has a bug or for some reason isn't compatible?
Noted. How would i go about exploring this? As in where do i go hunting?
All this said, this is a great toolbox and FAR more intuitive to use than Cartographer. So, thank you!
When running offline, if the processing queue gets large, and of course keeps increasing because the CPU is unable to keep up, will all the messages in the queue eventually get processed?
Yes, we store the queue and it gets to them as they can. But I will be honest in saying that I see performance degradation of the map quality if you just throw everything into the system all at once and let it process at max rate.
Part of the reason that I know of is TF2 related. TF2 only keeps so much of the time in its buffers. So If you don't have TF up and providing the relevant data at the relevant times, then obviously things start to degrade by no fault of SLAM Toolbox. Some of the data is just no longer available or might try to use the most recent possible data which could be waaay into the future relative to that scan.
But what i observe is that the bag finishes running and slam_toolbox also stops updating the map which caught me by surprise
I don't recall that happening from my personal experience, but there might be a rosbag setting you're missing to not bring things down once complete. There will be data missing though when it stops (TF, maybe others it relies on like the ros clock topic). Sorry, this is a case where I've been off of ROS 1 for so long that my memory is starting to fade about my workflows with respect to it. I'm pretty sure the clock though is part of it.
Noted. How would i go about exploring this? As in where do i go hunting?
When you don't see any loop closures, do you still see the Ceres prints?
All this said, this is a great toolbox and FAR more intuitive to use than Cartographer. So, thank you!
Thanks!
Yes, we store the queue and it gets to them as they can. But I will be honest in saying that I see performance degradation of the map quality if you just throw everything into the system all at once and let it process at max rate.
Part of the reason that I know of is TF2 related. TF2 only keeps so much of the time in its buffers. So If you don't have TF up and providing the relevant data at the relevant times, then obviously things start to degrade by no fault of SLAM Toolbox. Some of the data is just no longer available or might try to use the most recent possible data which could be waaay into the future relative to that scan.
That has exactly been my finding in that there are definitely artifacts like double walls that originally made me think it was a urdf issue, but when i run it slower and ensure that the queue does not get built up to more than say 10 messages, the "double wall" artifact is gone. So in all candor, it is not slam-toolbox, but the nuances of running large bags fast bag files on a computer that is not able to keep up
I don't recall that happening from my personal experience, but there might be a rosbag setting you're missing to not bring things down once complete. There will be data missing though when it stops (TF, maybe others it relies on like the ros clock topic). Sorry, this is a case where I've been off of ROS 1 for so long that my memory is starting to fade about my workflows with respect to it. I'm pretty sure the clock though is part of it.
i have tentatively circumvented this issue by increasing the queue size on rosbag play. Why that adverb? Because i don't know this to be universally true yet
ROS1 vs ROS2
Most of our framework for everything else is built up on ROS1, so migration is going to be a bit tedious; so I am exploring what i can in without changing the foundation just yet. But a migration is on the horizon.
When you don't see any loop closures, do you still see the Ceres prints?
No, Ceres prints don't show up at all when there is no loop closure. So i know there is causality there. That's why i was poking at "how do i know it is a ceres issue? Ceres is not even being called, so my working hypothesis is as such (i assume i am entitled to be categorically wrong given my limited understanding of slam-toolbox and the karto-sdk):
loop_search_maximum_distance
from the default 3.0 to a slightly larger valueBased on item 3 above, one follow-up quesiton is how loop_search_maximum_distance
and loop_search_space_dimension
are related to each other. Perhaps i should look at the source instead to save you some headspace!
Got it, that's why I asked :smile: If you put logs in the Ceres plugin, do you see it being called at all? Just making sure that its not an issue that it is being called, just not doing anything usefully due to invalid inputs, constraints not being reflected, or something else which would point to it not being something in the library core.
I'm still unclear as to why it works sometimes and not others for you. That doesn't seem sensible. What if you try doing it with only 1 lidar stream instead of merging them? I'm wondering if some of this oddity is related to the scan combiner resulting in badly formatted data on the output which for some reason is harder to correlate correctly. There have been reports historically from the RP lidars of things not working super well due to the cheap-o mechanics of it and a poorly setup driver. I'm not familiar with the node you're using the merge them or that driver in particular, so that could be related.
One way around these two is to increase loop_search_maximum_distance from the default 3.0 to a slightly larger value
And that worked? I'd definitely recommend taking a look at the source code.
Most of our framework for everything else is built up on ROS1, so migration is going to be a bit tedious; so I am exploring what i can in without changing the foundation just yet. But a migration is on the horizon.
Let me know if I can be of assistance - ROS 2 Migrations are something Open Navigation offers :smile:
Thanks for your responses
I'm still unclear as to why it works sometimes and not others for you. That doesn't seem sensible. What if you try doing it with only 1 lidar stream instead of merging them? I'm wondering if some of this oddity is related to the scan combiner resulting in badly formatted data on the output which for some reason is harder to correlate correctly. There have been reports historically from the RP lidars of things not working super well due to the cheap-o mechanics of it and a poorly setup driver. I'm not familiar with the node you're using the merge them or that driver in particular, so that could be related.
This is worth a try. Now that i think i have a working solution, i am going back to old bag files from other large setups to make sure things are kosher with slam-toolbox. Once i wrap that up (assuming it goes well), i will come back to this experiment you are suggesting
And that worked? I'd definitely recommend taking a look at the source code.
Yes, interestingly it did. I will dig into the source code. Thanks again for your time and thoughts. I'll post my findings here.
Open Navigation: Noted! 😄
Hi there @67bug ! Could you please post your findings?
@CraftyCranberry I have a bunch of findings that i will capture here, but I have to admit that none of them fit logic as I look at how the code should work. The repo is really designed for large spaces, but there is something strange in my specific implementation that i can't quite figure out precisely. I suspect that either I am seeing a small motor-odom drift or a small imu drift that is making this problem particularly bad for me, or i am dropping messages when i am recoding the rosbag or when playing it back.
But here are my findings:
The only other thing halfway reasonable that i can say here, @CraftyCranberry, is that keeping the graphs small-ish helped create great overall small maps and consequently great large maps. This helps me in a few other ways as the individual tesselated maps also change a lot so having to redo a map requires me to just map that one small area of the overall map.
I am trying out slam_toolbox for mapping and am running into issues with loop closure.
I am running
offline.launch
:The offline yaml file looks like such:
Here is where things get interesting. When i ran this the first time (last night) when I was at the site, a perfect map was generated and i saw a whole bunch of loop closures
Today, i tried re-running (as I was debugging something else in my side of the code) and i see no loop closures happening -- same bag file, same settings. And obviously the map is not good at all. What could be causing this?
Bag file link
odom
is the output of an ekf and is the real odomscan_1
is the right lidarscan_2
is the left lidarTerminal output for the good runs
So, i know that loop closure is happening the the result is evident in the map generated. In every other run since, i see zero loop closures.
What would cause this?