Map temporarily rotates or flips [rplidar]

NikolasE commented 4 years ago

Required Info:

Operating System:
- 18.04
Installation type:
- binary
ROS Version
- ROS1 melodic
Version or commit hash:
- 1.1.6-1bionic.20200802.004617

Steps to reproduce issue

Start robot close to docking position, use rviz-plugin to deserialize map, let robot snap to correct position, drive around a bit

Expected behavior

Robot stays localized, map stays fixed relative to map-frame

Actual behavior

after a while, the map rotates by 90deg relative to the map frame, so that defined poses (pickup-positions) are completely off

map_frame2 map_frame1

Additional information

Could be related to resizing the map as this line appeared at all times when the issue occured:

[ INFO] [1597750983.798238702]: Resizing costmap to 1240 X 1158 at 0.020000 m/pix

I now also saw that the "/slam_toolbox/karto_graph_visualization" markers do not fit to the map. I just turned on the robot and deserialized a map when I got this in RVIZ: map_frames_graph

I then drove around a bit by manually sending cmd_vel-commands and after a jump, the graph and the map are consistent: map_frames_graph2

SteveMacenski commented 4 years ago

So you're saying you deserialize, you drive around for a bit, add new nodes and get new features, then it messes up the graph, and then re-corrects itself?

Have a dataset to share that with?

Your first few screen shots are too zoomed in for me to see the 90 degree rotation of the map. The resizing may be related, but I kind of doubt it. Is the rotation / mess up in the map or in the costmap? If its the costmap but the map is OK, that's a completely separate issue.

When the graph points are off, is the map itself off? I don't see evidence of that from your images, but it might just be me not able to decipher the first few images.

NikolasE commented 4 years ago

It does not mess up the graph, it flips the map by 90deg relative to the map-frame. In the first two images, you can see my legs (the clutter to the right). After deserializing the map, I'm sitting at (1,-0.5) after driving around, I'm suddenly at (0,5, 1). This of course also messes up the robot movement as my move_base-goals are defined in map and the goal positions suddenly jump.

The weird thing in the lower images is that the slambox publishes a map and the corresponding karto-graph-markers, but the markers do not fit to the map. The jump aligns them and afterwards I don't see jumps anymore. What data would you need? The serialized map and a laser-bag file? I currently use laser-scan-matching as odometry as my test robot does not have encoders.

It seems that the orientation of the map-frame is aligned with the robot after deserializing the map.

SteveMacenski commented 4 years ago

Sorry, that didn't clarify it for me.

You say you drive around "a bit", how much is "a bit"? Does this ever happen out of the gate? Is it off by exactly 90 degrees or are you guestimating? How long until it corrects itself?

When you show the map with the graph nodes not aligned, which is accurate: the map, the nodes, or neither? Is it a quick "snap" for 1 or 2 updates then reverts? Or something more gradual?

The jump aligns them and afterwards I don't see jumps anymore.

The jump aligns them? I thought you said its OK from start and the jump unaligns them? Lets be really specific with your language to describe the problem. That way I can get a mental model of what's happening. A video or something from start til it recorrects would be helpful.

I currently use laser-scan-matching as odometry as my test robot does not have encoders.

Check in odometry frame if you see this "jumping", it could be that your laser scan matcher odometry is the one messing things up. While I don't probably think that is the case, lets just make sure. If you're in map from in rviz, do you see your robot base itself jump the 90ish degrees? If you're in odom frame now, do you see the same thing? If the answer is "yes" then its your odometry. If its "no" then its slam toolbox.

I would need whatever data you're using to create this situation. So if you have some serialized bag and then you have a bag of you running when it happens again, that's what I would need. Also post your config file.

[ INFO] [1597750983.798238702]: Resizing costmap to 1240 X 1158 at 0.020000 m/pix

You mention you think this might be a problem, run SLAM without anything else. No navigation, no nothing. Just odometry, laser, and slam. What I think is happening is you get a map update and you just see that because there's a new map update. The map update itself is the thing messed up, so that costmap resize is just happening at the same time you see the updates. I don't think its related, but lets just find that out right now.

chrisl8 commented 4 years ago

I'm having a similar issue, that I think might be related to this one. I posted a gif and a bag file on ROS Answers, but haven't had any response yet: https://answers.ros.org/question/359654/what-may-be-causing-my-map-to-sometimes-rotates-180-degrees-during-operation/

I'm not sure if the cause of what I'm seeing is in Slam Toolbox or not. I'm trying to log out some data to pinpoint it.

Interestingly, it happens every time eventually, and it happens even if I load a map in Localization mode!

It feels like every may just has to "flip" at some point, and it will do it eventually. If I save out a map before it happens, and then load it, then it will happen after I load it, which is the best way to reproduce it consistently (by reloading the same map over and over).

SteveMacenski commented 4 years ago

Can you reproduce in simulation? I'm curious again if this is a 360 lidar particular issue at play. I've never seen this in my experience so it would be good to have some set up that we can play with settings to see how / why it occurs being able to change important things like optimizer gains and laser types. This is almost certainly a function of the optimization problem, but I can't tell from this info if its an issue with the settings or the inputs.

What parameters, if any, did you change from default?

SteveMacenski commented 4 years ago

@chrisl8 if you can make it happen, here's what I'd recommend:

Take the dataset and show that you can make it happen in some way relatively consistently
Take the dataset and run it again except this time, use a laser filter on the RP lidars data to make it look like a 270 degree laser, and give that filtered output to slam toolbox. Let me know if you can make it happen then.

I'm wondering if this is a function of consistency of the inputs or the 360 lidar itself. If we can not reproduce it with a lower degree amount, then it might be an issue in dealing with 360 - that's something we may be able to work with. If it happens still with a lower degree, it may relate to the inconsistent input data from your RP lidar - still might be able to fix it but I'd be less confident in it. I'd need some reproduction info at that time to start looking into it. Now I would be really interested if someone could reproduce this with literally any other non-hobby grade lidar, regardless of 360/270/180/etc.

SteveMacenski commented 4 years ago

It also just occurred to me that maybe this is a result of the map rendering not aligning with the map updates because we silence the map to odom transform when it fails to update. Do you see this error when it happens? https://github.com/SteveMacenski/slam_toolbox/blob/foxy-devel/src/slam_toolbox_common.cpp#L411 do you have any logs?

chrisl8 commented 4 years ago

I don't have logs, but I can reproduce it repeatedly with a bag file, and no, I never see that error. There aren't any errors when it happens.

I'll try running it with a laser filter to see what happens.

If I had an expensive lidar, the first thing I would do to try to reproduce it is mount it "backwards" on the robot, as that seems like it could be at least a contributing factor according to reports in #198

@NikolasE Can you tell us what model lidar you are using (did you already say and I missed it?) and is it mounted facing "forward" or is it rotated in some way on your robot?

SteveMacenski commented 4 years ago

One other hypothesis I have after re-reading the initial report is regarding the constant parameter blocks https://github.com/SteveMacenski/slam_toolbox/blob/foxy-devel/solvers/ceres_solver.cpp#L179.

I missed in my initial read through that this only happens on continued mapping of previously serialized maps, is that accurate @NikolasE @chrisl8?

This section of the the code is responsible for anchoring the initial pose for the first node so that it can't rotate freely. An optimizer might choose to do something odd if we don't anchor the first node in place so that all future information is relative to it. I did a little looking and it appears to me that this is called on the re-loading of a map, but you could verify this yourself by changing the RCLCPP_DEBUG to a RCLCPP_INFO and looking for the log to make sure it set that parameter block as constant when loading.

I don't think this is likely the issue, since if it was, I think the pose graph and the map would be aligned and you show it not being so. Just something to bring up on the off chance its related.

chrisl8 commented 4 years ago

If you are interested, this bag file produces the map flipping every time, and it is only 2 minutes long: https://www.dropbox.com/s/d3vsgm50i15h9fl/2020-08-22-20-48-31_filtered.bag?dl=1

chrisl8 commented 4 years ago

It happens both when building a new map from scratch and and in "localization" mode (using an existing map, but not updating it).

I have never tried to continue to add to a previously serialized map.

SteveMacenski commented 4 years ago

A video would be helpful. I still don't have a physical understanding of what's happening, is the map image only moving, is the graph points moving, are both? Are they consistent with each other? @NikolasE post just shows that image and the nodes are off, but not what's correct, if any. Are they always off by exactly 90 or 180 degrees, or are you generalizing? I really need answers to these questions.

His images look like the image is correct and the nodes are just wrong (because the laser scan for the robot still matches the map image). If that's the case, I'm not sure you'd see any disruptions but not 100% sure why that's happening off hand.

chrisl8 commented 4 years ago

Here is a video: ROS Map Flipping 2

This was produced with the bag file: https://www.dropbox.com/s/d3vsgm50i15h9fl/2020-08-22-20-48-31_filtered.bag?dl=1

SteveMacenski commented 4 years ago

I got off the phone with some folks and they mentioned that this reminds them of the X axes / data issues that the RPLidar has. Have you actually patched that issue like we discussed at length in https://github.com/SteveMacenski/slam_toolbox/issues/198? Like inverting the ranges msg.ranges = msg.ranges[180:] + msg.ranges[0:180] or X axis so that the driver actually properly implements REP 103?

chrisl8 commented 4 years ago

Yes, I patched the issue in #198

As of this weekend I have rotated my RPLIDAR so that the front faces the robot's front, and this does not happen to me anymore. So I no longer have this issue, and the only practical advice I can give anybody is to ensure your LIDAR's "front" lines up with the front of your robot.

The OP did not mention what hardware they use, so I don't know if this issue is related to mine or not.

SteveMacenski commented 4 years ago

I think that means we can close this as another rplidar axes issue?

We can reopen if not the case.

NikolasE commented 4 years ago

Could we reopen this issue please?

I updated my system to 20.04 and foxy, use a 270deg scanner and still have a flipping map.

I have a small turtle-like robot with a Sick NanoScan and used the nav2_bringup/bringup_launch.py with slam:=True, so that the online_sync_launch.py is started. [I so far only changed the footprint of the robot]

To show the issue I added an Axes-Marker to map and shifted the grid so that the orientation is easier to see. At start, the x-axis is pointing through the opened door (robot started with odom=identity). After driving around a bit with Nav2-goals, the robot entered the corridor and suddenly the mapped rotated by 90deg and also moved the origin a bit (now no axis is pointing through the opened door).

If a nav2-goal is active during this flip, it seems that the navigation is using the new map, but the old coordinates, so a goal-pose can end up anywhere.

start flipped

Scene at start, image rotated by 90deg so that it's aligned with the flipped final map start_90deg

Bag File (with tf, tf_static and scan) is here: https://drive.google.com/file/d/14nC84wdjOL3UGRY73OSUNDbrrQWdHHaE/view?usp=sharing

Other than that, the mapping process worked great :)

https://www.youtube.com/watch?v=U1ogYbWdQbs

SteveMacenski commented 4 years ago

I see this comment, but I can't look into it for the next month or so between IROS/ROS World/other ongoing things I need to finish. I'd be more than happy to review fixes or any additional debugging you've done to isolate the problem, but it'll be mid-November before I can possibly address this fully.

I would start looking at Ceres optimizer and make sure that the first measurement / frame associated with it is properly held static. Maybe there's a bug in Ceres or I didn't do this properly. I'd try to figure out what happens on the update iterations where the map rotates / flips and let me know in those iterations what's happening.

coderkarl commented 3 years ago

First a big thank you to Steve for all the work you have done and continue to do. I do not mean this to be another fix request. I just want to help us resolve the issue. For those who are using rplidar or other hobby lidars, let's continue to try to resolve this ourselves. Should we fork this and continue hobby lidar issues there? I am seeing the same issue with a 90 deg mounted neato xv11 lidar on a create2 robot. base_link x fwd, y left laser x left, y back (w.r.t. base_link) odom started at identity, bad tf comes about 360 seconds from start. tf map to odom is the issue. It jumps 90 deg and somehow the costmap is still correct w.r.t. odom. I've confirmed /map has frame_id = 'map', but other info changes to make it correct w.r.t. odom. Note the costmap /map info changes come after the map to odom tf error (at least in rviz). ros2 foxy, /odom and /scan over wifi from rpi3, xps13 runs the slam node. I have a ros2 bag file if you think that will help. slam_toolbox_map_odom_tf_jumps_90deg

costmap /map before and after error: costmap_before_and_after

Here is a video so you can watch it by stepping frame by frame. This is only the end of the run when the error occurred. The first 5+ minutes of the run is fine. If it can work for 5 minutes with this laser frame 90 deg from base_link, why would the base_link to laser tf be the cause?

https://user-images.githubusercontent.com/21990134/104787636-a1018480-5755-11eb-9260-8b9e6fe4a140.mp4

coderkarl commented 3 years ago

I have observed that the section of code Steve mentioned above... https://github.com/SteveMacenski/slam_toolbox/blob/foxy-devel/solvers/ceres_solver.cpp#L179 ...is executing when the map to odom jump occurs. I simply switched it to ROS_INFO and saw the output with the jump. This section of code only executes one time; when the bad jump occurs. Using Eclipse, I see that CeresSolver::Reset() is never called. I verified the condition (firstnode != nodes_->end() ) is true at start. CeresSolver::Compute() is only called intermittently. The map to odom jump occurs when this is called. I let it keep on running after the scenario similar to the images above, and the second time Compute() is called, the costmap /map becomes an ugly merge of two maps at different orientations. Compute() is called by CorrectPoses() in Mapper.cpp, MapperGraph::TryCloseLoop() when the candidateChain is not empty and coarseResponse is high enough. One temporary idea is to just disable this with a parameter. I will continue to study Compute() and see if we can still get the loop closure to work properly.

coderkarl commented 3 years ago

Focusing on CeresSolver::Compute(), one idea is the constraint for static initial pose needs to consider the transform between base_link and laser, or the scan data brought into the solver needs to be properly transformed beforehand. pose.SetX,Y,Heading may be setting corrections that assume the local laser frame aligns with the base_link frame. Is it possible that Compute() just hasn't been called with high quality lidars? A past issue discussion states: "In the Ceres plugin solver I check if we have a first node and we lock down the pose and orientation of it so that it cannot move as the graph updates." I am pretty sure that is the same section I am checking in CeresSolver::Compute(). I can see the node contains x,y,heading. (firstnode->second(0,1,2) ). One hypothesis is that constraining the first node (0,0,0) does not constrain the other nodes. The other nodes can move freely around the first constrained node without moving or rotating the first node. I will try to constrain the first N nodes instead to test this idea.

coderkarl commented 3 years ago

When I try doing this for multiple nodes (e.g. first 3 nodes added): problem_->SetParameterBlockConstant(&it->second(0)); problem_->SetParameterBlockConstant(&it->second(1)); problem_->SetParameterBlockConstant(&it->second(2));

I get this runtime error: F0116 20:58:34.186025 30736 problem_impl.cc:635] Parameter block not found: 0x7ff1b415e590. You must add the parameter block to the problem before it can be set constant. [sync_slam_toolbox_node-1] Check failure stack trace: This might be related.

It was just an issue with my unordered_map for first_nodelist. If I use a secondnode just like firstnode, I am able to easily add two sets of constraints. However, when Compute() is called, I still see the issue with two constraints: CeresSolver: Setting first node as a constant pose:0.00, 0.00, 0.00. CeresSolver: Setting secondnode as a constant pose:1.66, -0.06, 0.02. Perhaps a third non-collinear node is needed. Pending. The map to odom tf may still be able to flip while keeping the constraints above.

coderkarl commented 3 years ago

To help verify where the issue might be in the code, I simply skipped: ceres::Solve(options, problem, &summary); As well as the summary check. So the original node poses are simply pushed back into corrections. This made me realize that corrections are already corrected poses, not small deltas to add to nodes. With the original nodes unchanged and pushed back to corrections, the results are worse. It looks like many copies of the original map rotated at different angles, all on top of each other. I will have to spend more time understanding the loop closure problem. I believe each x,y,heading node is the origin of a scan. Since the whole map is good with the first corrections, I think I went too deep. I will trust Compute for now and look downstream. Something is likely rotating the whole corrected map elsewhere.

SteveMacenski commented 3 years ago

Hi,

Also, please check if this PR fixes your issue: https://github.com/SteveMacenski/slam_toolbox/pull/326 @NikolasE @coderkarl

@WLwind found something interesting and subtle that this may also relate to. I'm not 100% sure that this will fix your issue (65%), but I am about 85% sure that an issue like this is what is causing your issue as well.

Reopening by request

coderkarl commented 3 years ago

I tested PR #327 on foxy-devel. This appears to fix it for me. I am struggling with ros2 bag play options to test it on a bag, but I have gotten the issue consistently with a short run I repeat around my house. I first repeated the issue on foxy-devel, made the changes from WLwind and did not see the issue. I observed 6-7 loop closures (using added output) as I kept on doing the small lap around my house.

WLwind commented 3 years ago

I tested PR #327 on foxy-devel. This appears to fix it for me. I am struggling with ros2 bag play options to test it on a bag, but I have gotten the issue consistently with a short run I repeat around my house. I first repeated the issue on foxy-devel, made the changes from WLwind and did not see the issue. I observed 6-7 loop closures (using added output) as I kept on doing the small lap around my house.

Thanks for testing on ROS2 !

SteveMacenski commented 3 years ago

@coderkarl just to verify, in your house would you normally have excepted to see it in the time you spent testing this PR? Making sure we don't have a bias testing that it didn't "break" anything vs "solving" the issue.

But good to know the ROS2 version is OK.

coderkarl commented 3 years ago

I normally would have seen the issue in the time I tested the PR. I also output text every time the Loop Closure occurs, because I verified that is when this issue occurs, consistently. In hindsight, I should have loosened the requirements that trigger Loop Closure (and then CeresSolver::Compute() ). Maybe then I would not have to drive two laps to get the issue. Before, the map flipped every time Loop Closure and Compute was executed. With the PR, I printed to the terminal to verify Loop Closure was still executed but did NOT flip the map anymore.

coderkarl commented 3 years ago

Steve, if you or someone else has a robot with the non-hobby lidars that you use to test this, I would recommend the following for validation: Modify your odom published and odom to base_link tf as if the rear of your robot is now the front. Maybe remap odom to temp_odom, then have a simple node subscribe to temp_odom and publish /odom and the tf. Rename the base_link frame to temp_base_link in your original odom publisher node. Then use a static_transform_publisher for base_link to laser tf with 180 deg static yaw.

SteveMacenski commented 3 years ago

Fabulous! This is wicked good news! This solves all of the awful outstanding problems I couldn’t reproduce with my systems.

That was a very subtle problem, I wonder if SRI put some easter eggs like that into their open source version on purpose or if it was really that poorly tested even if you bought the professional version...

major kudos to WLwind! You really dug in there more than most and found a pretty easy to miss bug. Songs from rplidar users should be written in your honor and sung for years to come!

SteveMacenski commented 3 years ago

@coderkarl are you using localization mode as well? I had blocked the PR until we figured out the root cause of the CeresSolver: Ceres could not find a usable solution to optimize. warning @WLwind mentioned in their testing. However, I just did some testing with a small map serialized before the changes and after the changes, and in neither case did I see that error after deserializing in localization mode.

If this is the case - that this fixes the issue and we don't see that warning in localization mode, then I can merge this in and we can close this issue out! I think we're all in agreement this has no issues in mapping mode.

SteveMacenski commented 3 years ago

If I can get a :+1: from another user on these PRs (now for all distros, test one that's easiest for you) for mapping / localization, I'll merge the bunch and do a massive sync to get these out to binaries.

chrisl8 commented 3 years ago

I will try to do some tests on Noetic #325 this weekend, both with bag files and on my robot.

chrisl8 commented 3 years ago

Noetic #325 appears to be an enormous improvement!

I ran the original bag file from Issue #198 through the current main branch a few times and got different results every time, none of which looked reasonable:

slam_toolbox_previous-Mach2021a

slam_toolbox_previous-March2021b

Then I used Noetic #325 many times and obtained this result every time:

slam_toolbox_fixed-March2021

The fact that Noetic #325 not only creates a map that is reasonable, but that it is also 100% consistent on every run for this bag file is wonderful!

Further I set up my robot to make a new map in my home, saved it, restarted ROS, loaded the map, set up about 15 destinations, and let it randomly navigate between them for about 6 hours, and it worked flawlessly the entire time.

SteveMacenski commented 3 years ago

Further I set up my robot to make a new map in my home, saved it, restarted ROS, loaded the map, set up about 15 destinations, and let it randomly navigate between them for about 6 hours, and it worked flawlessly the entire time.

Ooooh ok, @chrisl8 is this in pure localization. or just continued mapping in SLAM mode?

Very excited to have this problem solved thanks to some great sleuthing by @wlwind, just want to make sure we understand the ramifications of these changes if there are issues in localization mode to resolve those as well. Though I suppose even if this does introduce some warnings in localization mode, its an improvement that then SLAM itself works with 360 lidars and it could be a ticket to resolve later.

chrisl8 commented 3 years ago

@SteveMacenski Pure localization.

I always start by using Rviz to send the robot through the unknown area to make a map, but as soon as I've got a full map I save it off ASAP to avoid "corrupting" it. Then I shut down and restart ROS, load the map, and run in pure localization mode. I find this to be most reliable, and I can keep using the same map in pure localization mode for a few days until too many chairs and boxes get moved around the room.

So yes, it feels very stable in localization mode now.

Sorry that I didn't think to look at the log. I usually just set the robot going and ignore it until it gets fully stuck, and then look at what happened, but during my last test it just kept navigating around the map in pure localization mode for hours and hours until the batteries died. I shut it off, put it away, and went to bed. I didn't think to look at the log, and that log is now gone.

I'm happy to run it again and watch for warnings/errors, but it will be a few days before I get to it.

SteveMacenski commented 3 years ago

@SteveMacenski Pure localization.

Ooooooooooh, Ok. Yeah, rerunning it and either glancing at the log or saving it would be good so we can make sure its not flooded with Ceres warnings (even if it doesn't impact the performance). The big thing is just knowing if we introduced some issue, that we know about it.

PGotzmann commented 3 years ago

@SteveMacenski in reply to your request here https://github.com/SteveMacenski/slam_toolbox/issues/355#issuecomment-797743205 I also tested this fix for foxy (https://github.com/SteveMacenski/slam_toolbox/pull/362 ) on an industrial like mobile robot with a professional 360deg LIDAR.

First of all, I have also been effected by this issue since my LIDAR is rotated by 90deg too, which leads to my map rotating by the same angle at the first loop closure or initial localization. I used the latest foxy-devel branch (a2136d5f980ad4c8d0494483751d0c1064ae158a) as a baseline and ran it both in SLAM mode to record a map and in localization mode for comparison reasons. Otherwise I followed the suggested test workflow below:

[ ] Run localization mode using this branch on an existing serialized file, note any warnings out of the ordinary [ ] Run SLAM mode using this branch and save the serialized file [ ] Run localization mode using this branch on this new serialized file, note any warnings out of the ordinary

1) Localization mode with baseline map and foxy-devel When using just the baseline setup without this patch, I see a 90deg map rotation at the first loop closure or after initial localization. The image below shows the original map in the background and the map published by slam_toolbox. After that the map was consistent and provided reasonable localization without any further jumps.

I also occasionally notices some of the Ceres warnings that you mentioned (without the patch):

[slam_toolbox] Solver Summary (v 1.14.0-eigen-(3.3.7)-lapack-suitesparse-(5.7.1)-cxsparse-(3.2.0)-eigensparse-openmp-no_tbb)
[slam_toolbox] 
[slam_toolbox]                                      Original                  Reduced
[slam_toolbox] Parameter blocks                          486                       -1
[slam_toolbox] Parameters                                486                       -1
[slam_toolbox] Residual blocks                           117                       -1
[slam_toolbox] Residuals                                 351                       -1
[slam_toolbox] 
[slam_toolbox] Minimizer                        TRUST_REGION
[slam_toolbox] 
[slam_toolbox] Sparse linear algebra library    SUITE_SPARSE
[slam_toolbox] Trust region strategy     LEVENBERG_MARQUARDT
[slam_toolbox] 
[slam_toolbox]                                         Given                     Used
[slam_toolbox] Linear solver          SPARSE_NORMAL_CHOLESKY   SPARSE_NORMAL_CHOLESKY
[slam_toolbox] Threads                                    50                       50
[slam_toolbox] Linear solver ordering              AUTOMATIC                AUTOMATIC
[slam_toolbox] 
[slam_toolbox] Cost:
[slam_toolbox] Initial                         -1.000000e+00
[slam_toolbox] 
[slam_toolbox] Minimizer iterations                       -2
[slam_toolbox] Successful steps                           -1
[slam_toolbox] Unsuccessful steps                         -1
[slam_toolbox] 
[slam_toolbox] Time (in seconds):
[slam_toolbox] Preprocessor                         0.000031
[slam_toolbox] 
[slam_toolbox]   Residual only evaluation          -1.000000 (-1)
[slam_toolbox]   Jacobian & residual evaluation    -1.000000 (-1)
[slam_toolbox]   Linear solver                     -1.000000 (-1)
[slam_toolbox] Minimizer                           -1.000000
[slam_toolbox] 
[slam_toolbox] Postprocessor                        0.000032
[slam_toolbox] Total                                0.000063
[slam_toolbox] 
[slam_toolbox] Termination:                          FAILURE (ParameterBlock: 0x7f7565f696c8 with size 1 has at least one invalid value.
[slam_toolbox] First invalid value is at index: 0.
[slam_toolbox] Parameter block values:         -nan )
[slam_toolbox] 
[slam_toolbox] 1615898457.871084614: [graph_localization] [WARN]   CeresSolver: Ceres could not find a usable solution to optimize.

Not sure if that is relevant, but just to be complete here, the log also showed some Message filter dropping messagewarnings:

[slam_toolbox] 1615911275.836100518: [graph_localization] [INFO]    Message Filter dropping message: frame 'scan_link' at time 1615911275.780 for reason 'Unknown'

2) Localization mode with baseline map and #362 When running slam_toolbox with #362 and the map from the baseline setup, it first looked like the map was deserialized correctly but the map exploded immediately when the first loop closure/initial localization occurred. (see picture below)

3) Localization mode with new map and #362 Localization with this patch and a newly recorded map works just as expected. I did not see any rotations or unexpected jumps anymore. As with the baseline setup, the log showed some of the Ceres and Message Filter warnings, but I could not see that this warnings had significantly increased, so I do not think this is necessarily related.

Conclusion: In conclusion I can say, that this patch is clearly an improvement and fixes the issue in question. While I saw some Ceres warning, I would consider them independent form this patch, as they also occurred with the baseline setup. However, it looks like this patch might not work with previously serialized maps, as the tested map got corrupted in the first update. From this test alone I can not say whether this is limited just to maps recorded with a rotated LIDAR or not.

Hope that helps :wink:

SteveMacenski commented 3 years ago

Awesome! So it seems like the answer is it fixes things, I'll merge into ros2 directly since that will turn into Galactic.

For existing distributions (foxy, noetic, etc) what do we want to do, since this might invalidate previously serialized files? The best answer I can think of is just having separate foxy-devel-fixed branches with the patch to this so that they are around but not be default enabled to not break current behavior. Or the other way around and update default behavior but store foxy-devel-old-behavior?

chrisl8 commented 3 years ago

My vote is definitely to make this fix the default for the primary branch for each distribution, as well as the code that is released.

I do understand how it is concerning to create a situation where running apt get upgrade will invalidate existing maps, but on the other hand: If you create one more branch for each distribution, will you also ensure that every future PR gets applied to both the "old-behavior" and "fixed" branch of each distribution? It seems like you doubling your own work.

I think, especially with Noetic, that you will continue to have new ROS 1 users arrive with rotated RPLIDAR units installing the released code via apt and running into this issue for some time if this fix isn't the default for new installs.

SteveMacenski commented 3 years ago

I agree, but I don't want to break behavior for N people without a roadmap to letting them continue on for at least the EOL of their given distribution if they want. I'm going to post on discourse about this and see what folks say. That could at least act as an announcement that this change is coming so that no one's surprised if they google.

https://discourse.ros.org/t/request-for-input-potential-existing-slam-toolbox-serialized-file-invalidation/19520 discourse post

SteveMacenski commented 3 years ago

This has been resolved! The patches have been merged. For historical completion, all branches now have an -unfixed variant which is the branch before this patch was applied. Therefore, if you previously had things working and you later have issues, then you can use the <distro>-unfixed branch instead to maintain your existing behavior.

For everyone else, the new devel branches now have the patch and will be released soon. I will be updating the readme files to contain a mention of this issue, its discussion, and resolution for future reference.

SteveMacenski / slam_toolbox