SteveMacenski / slam_toolbox

Slam Toolbox for lifelong mapping and localization in potentially massive maps with ROS
GNU Lesser General Public License v2.1
1.59k stars 508 forks source link

Implement Many-to-Many Multi-Resolution Scan Matching #432

Open LucasJSch opened 3 years ago

LucasJSch commented 3 years ago

Feature description

Currently the slam_toolbox has one way of doing scan matching with a two-resolutions method. This has been until now a viable solution, but improvements can be made. The process of scan matching is one of the main pillars for the overall slam_toolbox system, and improving it can result in notable benefits for the overall performance, both in terms of the processing time of the scan matching pipeline and in terms of the accuracy of the estimated pose.

The idea to improve the scan matching mechanism is to add support for M3RSM as an alternative to the current two-resolution scan matching, based on this paper by Edwin Olson.

Implementation considerations

As a more high level comment, to allow multiple scan-matching methods support, the simplest way around seems like declaring an interface for scan matchers, and making the current ScanMatcher class implement that interface. The new scan matcher will also implement that interface, allowing them to be easily interchangeable. Once we have both implementations ready to go, the user should be able to define which scan matching method he wants to use by using a specific flag.

LucasJSch commented 3 years ago

To have an intuitive understanding of how all this pieces will work together, I've made an intuitive sequence diagram of how an estimated pose is computed with this new pipeline.

(I suggest clicking on the image to see it at full resolution)

Captura de pantalla de 2021-09-02 09-11-58

SteveMacenski commented 3 years ago

That certainly sounds interesting, but that's also sounding alot like a new slam algorithm than an extension of this one, if my understanding is correct.

As far as I think about graph-based SLAM, there are 3 components

A change like this would keep the graph solver / loop closure logic but be a significant change to the scan matching system. Maybe what we could do is vendorize the matcher system such that they can be plugins to set what to use. But flat out replacing it probably wouldn't be possible in this project since that would dramatically change the behavior of the software.

I might be open to that concept of swapping it out if there was specific proof from creating an implementation and testing it that it genuinely out performs whats there today. Finding some techniques that look promising in literature is one thing, creating it and seeing how it fairs in reality is another. Especially when this technique is matching based on rasterized maps rather that direct sensor measurements, I sort of doubt that all things considered this would result in a better pose estimation and be faster. More on that below.

Generating those occupancy maps is non-trivial in compute costs and they are not currently computed online in the SLAM process. This system matches based on sensor measurements directly and then periodically in another thread we take the current state of the system and generate an occupancy map that is published for outside consumers. But no where internally is that occupancy grid ingested for matcher / SLAM use. So to implement this, you'd have to actually generate the occupancy maps, and of multiple resolutions, to even get to the point that you can match against them online. I think you can see why I am suspicious that this method would actually be a speed up versus matching on a local buffer of laser data that takes advantage of multi-threading (which I implemented here, though not well described in the readme I don't believe).

The actual localization quality this gives also makes me slightly suspicious, but I'd be totally open to being contradicted on this point. This is an area that you just have to try and see, there's no way of just trying to rationalize your way around it. The fact that this paper doesn't actually compare against any other systems makes me suspicious -- but that's not a good reason.

Can you tell me a little about the reasons behind the requested changes and why you picked this particular method to discuss? I'd love more SLAM work and improvements in ROS / ROS 2, and will help however I can, even if it turns out this project isn't the right place. Longer term, I would love to be able to dedicate some time on making some more generalize localization and SLAM frameworks so people can plugin and play different components to build modular but efficient SLAM systems. This is something I started in the Nav2 working group (https://github.com/ros-planning/navigation2/pull/1930) with some volunteers, but is currently on hold due to lack of time / interest in continuing by the members. It is my full intent to pick up this work at some point in the future -- if that was something you might have interest in. I've just been distracted by some more immediate issues.

hidmic commented 3 years ago

Let me try to answer that on behalf of @LucasJSch, a co-worker of mine.

A change like this would keep the graph solver / loop closure logic but be a significant change to the scan matching system. Maybe what we could do is vendorize the matcher system such that they can be plugins to set what to use. But flat out replacing it probably wouldn't be possible in this project since that would dramatically change the behavior of the software.

We fully agree with that and we do not intend to substitute the current scan matcher, but to enable a user to explicitly configure and use the M3RSM scan matcher instead of the default one. I think that's what @LucasJSch meant by introducing an scan matcher interface. Having a plugin system in place would be the logical next step if we go down this path.

This system matches based on sensor measurements directly and then periodically in another thread we take the current state of the system and generate an occupancy map that is published for outside consumers. But no where internally is that occupancy grid ingested for matcher / SLAM use. So to implement this, you'd have to actually generate the occupancy maps, and of multiple resolutions, to even get to the point that you can match against them online.

I'd say that's partially true. The current scan matcher implementation does have to build an occupancy grid based on one or more reference scans before it can correlate an incoming scan. So while an M3RSM implementation could maintain the full map at various resolutions, it could also build it on demand based on a number of reference scans and their current best pose estimates. In fact, Olson's paper suggests searching over various combinations of incoming scans and reference scans (and thus rasterized maps) as a means to downplay outliers. That's why M3RSM is dubbed a many-to-many scan matcher.

Having said that,

I think you can see why I am suspicious that this method would actually be a speed up versus matching on a local buffer of laser data that takes advantage of multi-threading

We fully agree with your sentiments here. It's very hard to say if and when this single-threaded heap search would beat the current multi-threaded grid search. We won't know until it is implemented. Intuitively, I think it's likely it won't for small or mid-sized search spaces but it might for large ones, improving loop closure rates. But that's just a hunch.

Can you tell me a little about the reasons behind the requested changes and why you picked this particular method to discuss? I'd love more SLAM work and improvements in ROS / ROS 2, and will help however I can, even if it turns out this project isn't the right place.

So, as you may have guessed already, @LucasJSch is not completely on his own. Ekumen, the company we work for, is diving deeper into SLAM research and while we're at it we figured we could give back to the community. We want to put Olson's research to test because it looks promising. We chose SLAM Toolbox as the recipient because it's a well-coded, actively-maintained, open-source SLAM system (not a small feat). Assuming you're onboard with our contribution, of course, and subject to test results -- if it fares horribly we'll be the first to drop it.

I hope that clears up the picture a bit. If not, we're open to f2f chats anytime.

https://github.com/ros-planning/navigation2/pull/1930

Pretty interesting. Do you get together every now and then (like a working group)? Anyhow, we'll take a look.

SteveMacenski commented 3 years ago

Having a plugin system in place would be the logical next step if we go down this path.

I agree, with a degree of hesitation. This is all pretty tied into Karto and its pretty easy to screw things up in Karto if you're not careful, so to do this I'm kind of wondering how much rearchitecture / surgery is needed. This codebase should remind you a little of Navfn, both influenced by Kurt who was an amazing researcher, thinker, and made very efficient code, but is questionably complex and has a ton of subtlies. This meant to be a standalone SLAM solution, not really a framework for additional modifications of the core algorithm of that size -- the optimizer plugin was just a very natural place to have it since that was vendorized externally anyway (and I was playing with g2o/ceres/etc). I imagined this having more features added on top of it using the complete serialized files and integrations more than core modifications. But happy to chat about modifications if there's a tangible reason to modify it, which kind of bleeds into my next question.

A better question rather than "if you can" is "should you?" -- is what's there now not good enough? What requirements do you have that this isn't meeting requiring modification? Or is this a research project to build some internal knowledge / capability / publish / etc in the absence of a particular need for improvement?

I think this is a better place to start is the beginning. From my perspective, I got a ticket proposing major surgery of a method that as far as I'm aware works well with another method based on a 6 year old paper (with considerably fewer citations than the current method -- not that academic citations mean anything at all, but a metric since this paper didn't actually compare against other methods).

The largest space I'm mapped thus far with this is around 240,000 sqft. While I literally don't have access to data sets for spaces larger than that, I looked over my notes and they indicate that I didn't think there was any reason it wouldn't work for up to 500,000 sqft with some practical improvements that I simply didn't make because it wasn't relevant to my application so I moved onto other problems. Promising 1,000,000 sqft is the number I see from places like Fetch and Locus as "if we can do this, we can do anything" is a tough sell considering I don't have data for a space 1/4 that size, so can't make really any claims here. Though with offline computation or optimizing for cloud resources.... my intuition is its possible with some more significant updates based on profiling.

So that's why I ask, mostly. I want to understand the problem this is looking to solve.

The current scan matcher implementation does have to build an occupancy grid based on one or more reference scans before it can correlate an incoming scan.

The correlation grid you are referring to that are used with the local scans isn't an occupancy grid, but the point is taken that there is something grid-y going on. If those are something you can use (or something similar) then yeah, got it. As long as its just a local view of things, I can go with you that theoretically one could make this work out of similar compute resources. I would think the pyramid would make it heavier, but there may be games you could play there. I'm on board with you in concept.

Assuming you're onboard with our contribution, of course, and subject to test results -- if it fares horribly we'll be the first to drop it.

Of course :smile: I think a good place to start is just seeing how well that works and the compute behind your implementation if its practical. If the technique pans out, then its more meaningful to try to test in this context. If that works, then we can see how hard it is to expose that scan matcher interface to have a plugin based system. I'm honestly not sure what the answer to that is, I haven't looked to see how hard I think that would be. I suspect, quite hard, but I'm open to being totally wrong. Alternatively, depending on your priorities, we could switch that all around and start with seeing how hard it is to swap out the scan matchers and if its impractically hard and you don't need it that much, maybe its moot. See question above to understand your motivations.


I'm always game to help out with open source contributions, especially people willing to put in leg work and have the competency to actually drive some real change. If its not here, I'll find a place. It's rare I find other people both interested and competent enough to help in the SLAM and localization realms in ROS. This is by far the weakest aspect of the ROS ecosystem and while I have big plans long term, I think you know that localization/SLAM is just as complex as all of navigation in it of itself - so it takes large durations of time for me to make progress and why you don't see me publishing new localization methods as fast as you see new planners/controllers/capabilities. By no means should you take my challenges here as "no"s, but trying to suss out the reason we're having this conversation to know what the best direction to go is.

If your motivation is as pure and non-specific as Ekumen is diving deeper into SLAM research and while we're at it we figured we could give back to the community, there may be better directions we can take this that would be more impactful / help the community or your customers more. That ticket would be one such place, but also mixed 2D/3D localization methods are another.

Pretty interesting. Do you get together every now and then (like a working group)? Anyhow, we'll take a look.

Why yes, we do :smile: It's almost like you don't read my discourse posts https://discourse.ros.org/t/great-presentations-in-the-navigation-working-group/21934 :laughing: We meet every 2 weeks on Thursdays. But I can make a meeting otherwise that could work better if we have a subgroup interested in working on a targeted project. There's a few things in Nav2 that have specifically targeted subgroups with regular meetings. We also have a slack group, feel free to join and introduce yourself on onboarding. Also a convenient place to PM me for more quick-paced communications when our timezones align.

https://join.slack.com/t/navigation2/shared_invite/zt-uj428p0x-jKx8U7OzK1IOWp5TnDS2rA

hidmic commented 3 years ago

Sorry for the delay to reply. Busy week.

This is all pretty tied into Karto and its pretty easy to screw things up in Karto if you're not careful, so to do this I'm kind of wondering how much rearchitecture / surgery is needed.

Fair point. A scan matcher plugin system is not a requirement here though. Definitely not to begin with. Without getting too much into technicalities, since IIRC only two public karto::ScanMatcher APIs are actually used by caller code (karto::ScanMatcher::Create, karto::ScanMatcher::MatchScan), to transition from karto::ScanMatcher to, say, karto::ScanMatcher as an interface, karto::CorrelativeScanMatcher, and karto::M3RScanMatcher within karto SDK itself would do equally well while being straightforward to implement.

I think this is a better place to start is the beginning. From my perspective, I got a ticket proposing major surgery of a method that as far as I'm aware works well with another method based on a 6 year old paper (with considerably fewer citations than the current method -- not that academic citations mean anything at all, but a metric since this paper didn't actually compare against other methods).

Yeap. To be fair, I'd be as wary as you are. But I want to emphasize that, in theory, this should be minor surgery. I will also point out that the current scan matching method is Olson's correlative scan matcher, as mentioned in section V of Konolige et. al, and M3RSM's paper does claim to improve performance over his previous work. That may well not be true. Hard to tell without a functional implementation though.

The largest space I'm mapped thus far with this is around 240,000 sqft. While I literally don't have access to data sets for spaces larger than that, I looked over my notes and they indicate that I didn't think there was any reason it wouldn't work for up to 500,000 sqft with some practical improvements that I simply didn't make because it wasn't relevant to my application so I moved onto other problems. Promising 1,000,000 sqft is the number I see from places like Fetch and Locus as "if we can do this, we can do anything" is a tough sell considering I don't have data for a space 1/4 that size, so can't make really any claims here.

I see. Is that data publicly available? With decent odometry, in an environment with enough features, I'd also expect the system to work just fine as one scales up. It'd be interesting to see how the loop closure subsystem fares when stressed (e.g. a large diameter ring-like building with slippery floors). In that kind of scenario, I think (hope?) M3RSM may shine. Perhaps only for loop scan matching, keeping the correlative scan matcher for sequential scan matching.

By no means should you take my challenges here as "no"s, but trying to suss out the reason we're having this conversation to know what the best direction to go is.

Absolutely. I actually appreciate you're taking the time to do this back and forth. We are now much more aware of what's going on in the SLAM realm in ROS than we were a week ago.

I think a good place to start is just seeing how well that works and the compute behind your implementation if its practical. If the technique pans out, then its more meaningful to try to test in this context. If that works, then we can see how hard it is to expose that scan matcher interface to have a plugin based system.

I can agree with that, @LucasJSch thoughts? As a side note, the good thing about complying with karto::ScanMatcher API contract from the get go, as opposed to developing this in a vacuum, is that it leaves us in a good spot for "quick" integration and system performance evaluation.

If your motivation is as pure and non-specific as Ekumen is diving deeper into SLAM research and while we're at it we figured we could give back to the community, there may be better directions we can take this that would be more impactful / help the community or your customers more. That ticket would be one such place, but also mixed 2D/3D localization methods are another.

Point taken, and indeed it is as pure and non-specific as that. At least for the time being. Long term, we're open to broaden the scope and walk other related paths (subject to internal discussion, resource availability, etc.). I don't see those paths as mutually exclusive with this proposal though.

It's almost like you don't read my discourse posts https://discourse.ros.org/t/great-presentations-in-the-navigation-working-group/21934 :laughing:

There's just so much stuff to keep track of :sweat_smile:

I will definitely join you on Slack, and I'm sure there's a few of us here that'd be happy to join nav2 working group meetings. No need to create new meetings just yet. I do appreciate the willingness :smile:

SteveMacenski commented 3 years ago

Got it! Makes sense on performance and scan matcher API changes.

Is that data publicly available?

No. Not because of some perceived value in slam data, but simply that I don't have it anymore to get permission to release. That was my last company.

It'd be interesting to see how the loop closure subsystem fares when stressed (e.g. a large diameter ring-like building with slippery floors)

My odom was pretty OK, but definitely not amazing. It was a startup, so as good as odometry as I could get from about 2 months of work, if memory serves, but then moved onto other problems once it became generally "reasonable". And it did work well on slippery floors, because they were being deployed in retail stores where they polish the floor nightly :laughing: . Can't speak to the circular element.

It's a 'slippery slope' to focus too much on the edge cases of edge cases, if the solutions needed to make it work there make it not work for general use. It's usually a trade off, but I don't see any technical reason why there couldn't in concept be a solution that worked well for both. Historically just hasn't worked out that way. Most of the people that have issues with this package are because they've mistuned or haven't tuned it at all for their systems. To be fair, a graph based system like this is going to be far more complex to tune for a random user than a filtering approach or gmapping, but these approaches are just 'better' so its worth taking the time to learn what's actually happening and what the parameters mean (instead of just randomly trying stuff like many do and then get frustrated and email me bc they want a quick solution).

Perhaps only for loop scan matching, keeping the correlative scan matcher for sequential scan matching.

I hadn't thought of using 2 different matchers for those tasks, but makes sense in retrospect. It would need to be data driven to know which matcher works better for which and if that improvement is 'worth it'.

Point taken, and indeed it is as pure and non-specific as that. At least for the time being. Long term, we're open to broaden the scope and walk other related paths (subject to internal discussion, resource availability, etc.). I don't see those paths as mutually exclusive with this proposal though.

Maybe its worth us chatting about localization. I hate AMCL and have a roadmap for getting rid of it for something better and could be extended to support 2D and 3D lidars. That would be a high impact project of a similar caliber of complexity. If that would interest you, let me know. At the end of the day, better SLAM is always good for setup and refinements, but localization is what people use everyday, all day. Improving ROS's localization capabilities would have a more pronounced impact and has received significantly less attention -- even when Willow was open.

PS If you're looking for something to do here to get into Karto and understand my localization updates, this ticket would be a great way to dig into it and help out https://github.com/SteveMacenski/slam_toolbox/issues/419. Basically: we updated Ceres to throw out old terms from old scans removed from the scan buffer for localization and now we need a solution to fix a crash in AddEdges where it tries to get one of those old scans. Straight forward problem that gets right at the heart of the scan buffer localization method. It's a somewhat serious bug impacting pure localization mode users if they try to set initial pose twice in a row without moving the robot inbetween.

hidmic commented 3 years ago

Maybe its worth us chatting about localization. I hate AMCL and have a roadmap for getting rid of it for something better and could be extended to support 2D and 3D lidars. That would be a high impact project of a similar caliber of complexity. If that would interest you, let me know.

I definitely want to be in the loop. We can sort out how much of a contribution we can make as we go.

nickvaras commented 2 years ago

Do you mind elaborating on your hatred for AMCL Steve? I'm curious.

SteveMacenski commented 2 years ago

Its mostly regarding long term reliability, there are certainly elements missing from AMCL and the code is a bit too old-school to be easily modifiable. Plus in the last 20 years, we've created improvements on localization technology that this doesn't contain.

Its good for a demo, its good for an MVP, and its good for startups (maybe even good for environments without many changes over time) -- but I found in nearly every robot system I worked with professionally, there were issues that made me have to think about rewriting a new localizer or writing watchdogs to prevent or detect delocalization behavior that disrupted operations.

nickvaras commented 2 years ago

Thank you for your insights. I was curious if it was about the conceptual approach or the actual implementation. It is very dated indeed, and unlikely to be enough for real world applications right out of the box.