ethz-asl / segmap

A map representation based on 3D segments
BSD 3-Clause "New" or "Revised" License
1.06k stars 394 forks source link

Trajectory Correction for Localization #47

Open vkee opened 7 years ago

vkee commented 7 years ago

Hi,

I think implementing trajectory correction for localization would be a great feature to add to SegMatch.

Currently when running localization with a map generated on a previous run of SegMatch SLAM, SegMatch detects matches very robustly but does not correct its estimated SLAM trajectory. I think implementing this feature would make SegMatch a very robust SLAM system, especially for long term mapping.

I am planning on working on this over the next couple of weeks. My plans are to get the transformation correction from the filterMatches method in segmatch (https://github.com/ethz-asl/segmatch/blob/master/segmatch/src/segmatch.cpp#L487) and then update the factor graph. For the first correction, I am thinking to remove the prior factor and create a new prior at the corrected pose (I think I should be able to reuse some of the code from the estimateAndRemove method in incremental_estimator - https://github.com/ethz-asl/laser_slam/blob/41fdab0076d9282e4e9119c5023f9eef0e86464f/laser_slam/src/incremental_estimator.cpp#L170? but it does seem like a fair amount of work will still need to be done to do exactly what I need). For all future corrections, I am thinking to correct the latest pose by the correction.

Any thoughts / suggestions?

Thanks!

rdube commented 7 years ago

That would indeed be a very interesting feature! I am looking forward to see how that performs!

As a first step you could adapt the IncrementalEstimator::processLoopClosure() function and add an argument like bool set_as_new_prior which would default to false. https://github.com/ethz-asl/laser_slam/blob/41fdab0076d9282e4e9119c5023f9eef0e86464f/laser_slam/src/incremental_estimator.cpp#L63

That could already solve the first localization and could be interesting to see how much the trajectory is actually modified if that is used for every localization. Once that is running we can consider the next step. I'll come back to you with some more info about removing the prior as the code we have now was highly customized for loop closure.

Another idea could be to have a function IncrementalEstimator::processLocalization() which would help to keep the two separate until we experiment a bit and find a clean structure.

rdube commented 7 years ago

Yes I guess a processLocalization() would be more convenient at first as we can then keep the two applications separate and avoid conflicts.

You can have a look here to make a prior factor using the GTSAM expressions https://github.com/ethz-asl/laser_slam/blob/41fdab0076d9282e4e9119c5023f9eef0e86464f/laser_slam/src/laser_track.cpp#L175

And have to keep track of the id of the factor to remove so that you can provide it to iSAM2: https://github.com/ethz-asl/laser_slam/blob/41fdab0076d9282e4e9119c5023f9eef0e86464f/laser_slam/src/incremental_estimator.cpp#L226

Here is an example of how this ID can be retrived from the ISAM2Result returned by the iSAM2 update() function https://github.com/ethz-asl/laser_slam/blob/41fdab0076d9282e4e9119c5023f9eef0e86464f/laser_slam/src/incremental_estimator.cpp#L249

I hope that helps!

vkee commented 7 years ago

I implemented these ideas, with the pose corrections being done in the form of essential prior factors, and got it to all build.

However, once I ran it, when I process a localization, there is a segfault from an out of range error in the laser_mapper node in the segMatchThread function when segmatchworker.update(trajectory) is called to update the trajectory. I am pretty sure this is because I removed the first prior factor.

Do you have an idea of how to fix this issue? Do I need to do anything besides calling isam2_.update(new_factors_to_add, new_values, factor_indices_to_remove) to remove the factor?

Thanks!

rdube commented 7 years ago

Nice! You can use GBD to backtrace this segfault and see if the segfault is happening in your code or in GTSAM. For this you can add the following launch-prefix="xterm -e gdb --args" to the instruction creating the ROS node: https://github.com/ethz-asl/segmatch/blob/master/laser_mapper/launch/laser_mapper.launch#L15

Eg: <node name="laser_mapper" pkg="laser_mapper" type="laser_mapper_node" output="screen" respawn="true" launch-prefix="xterm -e gdb --args">

Then when you launch your launch file, this will open a new xterm where you can type r to run the program and once the segfault occurs, bt to print the stack backtrace. This backtrace should hint you towards a solution. You can copy paste it here if it doesn't help you.

For more information on GBD you can have a look here: http://www.yolinux.com/TUTORIALS/GDB-Commands.html

vkee commented 7 years ago

It doesn't seem to be particularly helpful.

This is what the stack backtrace says:

image

(this is the same out of range error)

rdube commented 7 years ago

Ok if you push your code and send me a link I can try to have a look.

vkee commented 7 years ago

Here are the repos.

https://github.com/vkee/laser_slam/tree/merge_loc_corr https://github.com/vkee/segmatch/tree/merge_loc_corr

The branches are both called merge_loc_corr.

Thanks!

rdube commented 7 years ago

You should add a check before this line to make sure that the size of lasertracks is larger than the id: https://github.com/vkee/laser_slam/blob/merge_loc_corr/laser_slam/src/incremental_estimator.cpp#L82

Otherwise I could not quickly find where a segfault could occur. Can you maybe add some debug information to check if there is a specific line where the segfault occurs? LOG(INFO) << __FILE__ << " " << __LINE__; can be handy here.

vkee commented 7 years ago

The segfault is at line 472 in the updateSegments method of segmented_cloud.cpp (https://github.com/vkee/segmatch/blob/master/segmatch/src/segmented_cloud.cpp#L472).

I uncommented the code that removes the first prior factor but it does not resolve this issue.

rdube commented 7 years ago

Good! It looks like the trajectory was updated. The issue is most likely that the application asks to update the target map which in a localization mode is not really associated with the trajectory.

For a quick step could you comment out the following line: https://github.com/vkee/segmatch/blob/master/segmatch/src/segmatch.cpp#L514

If that solves it we should do that call only when we are in loop-closure mode.

vkee commented 7 years ago

This solved the issue.

However, it is very unstable as you can see in the screenshot of the provided kitti localization demo bag (https://www.dropbox.com/s/buhrnn0sbwlkcyv/first_attempt_segmatch_localization_correction_unstable.mp4?dl=0). In a previous test before this screenshot, it also crashed from an indeterminate system GTSAM error.

I think a quick hack would be to require the correction to be within some threshold after the first localization but I don't think this really fixes the issue. Another hack would be to only correct after moving some threshold distance like is done for segmentation.

Any ideas?

rdube commented 7 years ago

As discussed you should look into the noise models and also make sure that the transformations you are using to create the factors are in the correct frame of reference.

vkee commented 7 years ago

I fixed some bugs in my code and adjusted the noise models a bit (weighing localizations after the first less) and I think it is a bit more stable, particularly when only removing the first prior.

Just to confirm, the transformation should be in the world frame since it is for the prior? (https://github.com/vkee/laser_slam/blob/merge_loc_corr/laser_slam/src/incremental_estimator.cpp#L93). The transformation correction, T_orig_corr, is computed in (https://github.com/vkee/segmatch/blob/merge_loc_corr/segmatch/src/segmatch.cpp#L550) just like the loop closure transformation (I'm not not transforming it with Eq 6 in the latest paper b/c I think it should be in world frame as it is a prior). Do you think these transformations are correct?

I'm a bit confused because it appears like the correction is the inverse of what it should be. See around 10 seconds the correction shifts the trajectory the opposite direction of what it should be (b/c we want to align trajectory to the target map) (https://www.dropbox.com/s/n68kj2gkx8tha38/segmatch_localization_only_remove_first_prior.mp4?dl=0). Note that this run is where we remove only the first prior.

This screencapture is where we remove previous prior (https://www.dropbox.com/s/zv0982c6ovwec3a/segmatch_localization_remove_prev_priors.mp4?dl=0). Around 10 - 15, you can also see the correction also seems to be inverse of the correct correction.

rdube commented 7 years ago

For now I would only consider the second case (where we remove previous prior) as this can better tell us whether the transformation is right.

It indeed looks like there is an issue with the transformation and I would expect this strategy to give us segment matches which are aligned with the z-axis. There could be an issue with the following composition: https://github.com/vkee/laser_slam/blob/merge_loc_corr/laser_slam/src/incremental_estimator.cpp#L93 You might want to have the T_orig_corr being applied on the other side (and or having to invert T_orig_corr). It is a bit difficult to debug from here but I guess that could help.

p/s: You can use the focus camera tool in rviz and click on a point to help navigate around it.

vkee commented 7 years ago

Okay.

I agree with you that the z axes should be aligned (basically the segments should overlap with each other right?). I removed the height offset of the source and the target so that it will be easier to see if they are overlapped (no perspective distortion of alignment).

I think we should try and figure out if the transformations are in the correct frame rather than just trying the different cases but I tried each of the four cases and took screen captures of them: 1) Correction applied to the right of the pose estimate (https://www.dropbox.com/s/adbtju5c3rzzkxv/right_corr.mp4?dl=0). This is the original case. As mentioned before, appears that corrections seem to be applied in the wrong direction from aligning the source and target.

2) Inverted correction applied to the right of the pose estimate (https://www.dropbox.com/s/vv4h4h1nhv3ggm9/right_corr_inverse.mp4?dl=0). I think this one is definitely incorrect. Corrections sometimes seem to be applied in the correct direction but it is pretty unstable. Additionally, the z dimension seems to get messed up quite a bit (I should have rotated the screenshot to show this better).

3) Correction applied to the left of the pose estimate (https://www.dropbox.com/s/7okjnaowb60ahir/left_corr.mp4?dl=0). This one seems to apply the corrections most correctly, but sometimes seems to be overshooting the correction. This does not quite make sense to me as I will discuss below.

4) Inverted correction applied to the left of the pose estimate (https://www.dropbox.com/s/z0u9yjr3hct85jy/left_corr_inverse.mp4?dl=0). I think it is pretty clear this one is incorrect.

Just to confirm, the corrected transformation should be in the world frame since it is for a prior factor (https://github.com/vkee/laser_slam/blob/merge_loc_corr/laser_slam/src/incremental_estimator.cpp#L93) so it is the transformation from the world frame to the pose? The transformation correction, T_orig_corr, is computed in (https://github.com/vkee/segmatch/blob/merge_loc_corr/segmatch/src/segmatch.cpp#L550) just like the loop closure transformation (https://github.com/vkee/segmatch/blob/merge_loc_corr/segmatch/src/segmatch.cpp#L495), so it is the transformation in the world frame from the pose to the corrected pose (I'm not not transforming it with Eq 6 in the paper you sent me b/c I think it should be in world frame?).

Do you think these transformations are correct? If it is, shouldn't that mean that case 1 is the way to do it, since we will want to have the prior be the transformation from the world frame to the corrected pose frame?

Additionally, besides the transformations, does the processLocalization function (https://github.com/vkee/laser_slam/blob/merge_loc_corr/laser_slam/src/incremental_estimator.cpp#L79) look correct? In particular, I'm not sure if I'm using Expressions properly as this is my first time working with them. I'm not sure if I'm adding the priors or removing the previous prior factor correctly.

Thanks!

rdube commented 7 years ago

Yes the segments should overlap. Could you try setting the following parameter to true: https://github.com/ethz-asl/segmatch/blob/master/laser_mapper/launch/kitti/kitti_loop_closure.yaml#L3 this will prevent having the local cloud to contain data accumulated from before and after the correction which causes the misalignment we see in the source cloud eg third video at 0:40. In our latest version we are correcting the source cloud after a loop closure.

I agree that the ideal solution is to write it down and check it and I would encourage you to do that. Another idea would be to write a simple unit test to check (1) that the transformation that PCL is returning is actually from the target map to the source map expressed in the world frame and (2) to check that the correction we are applying makes sense. I hope that helps!

vkee commented 7 years ago

Okay, that seems like a good change and good idea to try and debug.

I found that the PCL functions are used in this PCL tutorial (http://pointclouds.org/documentation/tutorials/correspondence_grouping.php#correspondence-grouping) and confirmed that the transformation is an active transformation from the source to the target (https://github.com/ethz-asl/kitti_to_rosbag/issues/5). Since we are trying to correct the frame, we want the passive transformation which is simply the inverse.

Therefore, T_w_corrected_pose = T_w_est_pose * T_corrected_pose_est_pose.inverse() where T_w_est_pose is the estimated pose at the moment and T_corrected_pose_est_pose is the active transformation that is returned from filterMatches.

However, this still didn't seem to fix the issue. I decided to display the tfs without actually making the correction as well as the delta transformations in the segment matches but this didn't seem to be too illuminating other than ruling out case 1 and 4 as before.

Then I remembered an issue I had before when going between rosbags generated by the tools (https://github.com/ethz-asl/kitti_to_rosbag and https://github.com/tomas789/kitti2bag) where everything seems to be offset by a bit. I realized this is perfect for testing the localization as I could use the target map from SegMatch which was generated with data from the first tool to localize when running the rosbag from the second tool.

From my experiments, it is very clear that case 3 is the only valid one. I am still trying to figure out why is the case. As you can see in the screencapture (https://www.dropbox.com/s/88hxiynybzi9e8r/localization_somewhat_works_but_oscillates.mp4?dl=0), it localizes very nicely and then oscillates between the original prior pose and the correct localized one each time it has a localization correction.

I am still trying to figure out what that is the case and would greatly appreciate any potential ideas on why that is the case.

Thanks!

rdube commented 7 years ago

Hi @vkee it looks like you are almost there! That is a very nice way to test your implementation as it illustrates an issue which is hard to catch when this transformation is identity. Good job! Fyi the parameter force priors can be used to start the trajectory at different global locations. We used this for multi-robot demonstrations: https://github.com/ethz-asl/segmatch/blob/master/laser_mapper/launch/kitti/kitti_localization.yaml#L122

For the localizations which are off the map, can you check what is the x and y components of the transformation of corrected_pose and whether that makes sense when you compare to the other ones which look correct? This would quickly verify whether there is a periodic issue with the transformation or if the issue lays somewhere else.

vkee commented 6 years ago

The transformations are below (I print out the original estimated pose and then the corrected estimated pose). They don't make sense as the correct corrections have a large change which is expected but the incorrect corrections have like no change. I displayed the tfs of the original estimated pose and the corrected estimated pose to try and see what is wrong. They make sense for the first correct but no sense at all for the second correction (around 1:11 in the screencapture, sorry for the shakey zooming and panning, I'm not particularly good at manipulating views in RVIZ https://www.dropbox.com/s/5qw0u72yes985n1/strange_localization_behavior.mp4?dl=0).

Do you have any idea what may be wrong? I wonder if the segments in the source map are transformed to the current estimated frame (it appears like they are at least in the visualization).

These are the transformations: correct [FATAL] [1503993127.659698139, 1317639680.602375038]: ORIG POSE -0.894313 -0.44727 0.0123641 209.064 0.447384 -0.893416 0.0406803 134.637 -0.0071488 0.0419125 0.999096 1.92418 0 0 0 1 [FATAL] [1503993127.659788434, 1317639680.602375038]: CORRECTED POSE -0.0702666 -0.996569 0.0437457 221.654 0.997511 -0.0699431 0.00888361 -110.69 -0.00579342 0.044261 0.999003 0.982338 0 0 0 1

incorrect [FATAL] [1503993153.179117534, 1317639697.294143766]: ORIG POSE -0.895832 -0.444098 -0.0161896 137.159 0.444198 -0.895923 -0.00303227 201.959 -0.013158 -0.00990778 0.999864 0.790153 0 0 0 1 [FATAL] [1503993153.179194770, 1317639697.294143766]: CORRECTED POSE -0.898127 -0.439365 -0.0180715 136.057 0.439586 -0.898137 -0.0106915 201.052 -0.0115332 -0.0175463 0.99978 2.45994 0 0 0 1

correct [FATAL] [1503993211.637867631, 1317639706.942473718]: ORIG POSE 0.472362 -0.88094 0.0286202 126.898 0.881359 0.472419 -0.005158 245.561 -0.00897684 0.0276611 0.999577 0.657122 0 0 0 1 [FATAL] [1503993211.637953289, 1317639706.942473718]: CORRECTED POSE 0.998983 -0.0412481 0.0181817 278.567 0.0416386 0.998898 -0.0216491 13.9208 -0.0172687 0.0223841 0.9996 -1.85998 0 0 0 1

incorrect [FATAL] [1503993250.861310468, 1317639709.777034979]: ORIG POSE 0.473067 -0.881003 0.00644899 143.92 0.880959 0.473109 0.00891062 277.606 -0.0109013 0.00146597 0.99994 0.619981 0 0 0 1 [FATAL] [1503993250.861386828, 1317639709.777034979]: CORRECTED POSE 0.476086 -0.879398 0.00141532 143.251 0.879398 0.476084 -0.00152193 278.713 0.000664572 0.0019692 0.999998 4.7735 0 0 0 1

correct [FATAL] [1503993287.702279032, 1317639712.438750832]: ORIG POSE 0.463406 -0.885031 0.0444487 159.231 0.885853 0.46396 0.00246471 306.159 -0.0228038 0.0382328 0.999009 0.457929 0 0 0 1 [FATAL] [1503993287.702367159, 1317639712.438750832]: CORRECTED POSE 0.997625 -0.0611618 0.0316854 345.218 0.0623241 0.997366 -0.0370938 16.7503 -0.0293332 0.0389804 0.998809 -4.36048 0 0 0 1

incorrect [FATAL] [1503993316.945889873, 1317639724.248407779]: ORIG POSE -0.876055 -0.481299 -0.0296513 117.969 0.481682 -0.876318 -0.00705202 362.72 -0.0225898 -0.0204605 0.999535 0.420279 0 0 0 1 [FATAL] [1503993316.945950956, 1317639724.248407779]: CORRECTED POSE -0.869681 -0.49172 -0.0431925 121.547 0.492412 -0.870339 -0.00643281 368.069 -0.034429 -0.026863 0.999046 1.93774 0 0 0 1

correct [FATAL] [1503993338.423706541, 1317639729.484645740]: ORIG POSE -0.874121 -0.485689 0.00425986 71.4256 0.485708 -0.874094 0.00684428 387.285 0.000399327 0.00805178 0.999968 0.572019 0 0 0 1 [FATAL] [1503993338.423795520, 1317639729.484645740]: CORRECTED POSE -0.0292243 -0.998999 0.0338624 363.056 0.999564 -0.0293516 -0.00326934 136.38 0.00425998 0.0337521 0.999421 -0.173585 0 0 0 1

incorrect [FATAL] [1503993358.396521331, 1317639733.220337537]: ORIG POSE -0.900833 -0.434135 -0.00510101 36.4568 0.434156 -0.900828 -0.00420767 404.914 -0.00276843 -0.00600504 0.999978 0.663792 0 0 0 1 [FATAL] [1503993358.396616126, 1317639733.220337537]: CORRECTED POSE -0.896899 -0.441037 -0.0325285 45.5123 0.441346 -0.897333 -0.00263176 406.867 -0.0280282 -0.0167168 0.999467 -8.03345 0 0 0 1

rdube commented 6 years ago

Aside from the z component which seems to vary a lot and might point to another issue I would actually believe that this behaviour is correct. That is, when previously correctly localized the difference between T_w_orig and T_w_corrected should be close to identity.

FYI a publication to TF is actually happening here: https://github.com/ethz-asl/segmatch/blob/master/laser_mapper/src/laser_mapper.cpp#L54 and this transformation is computed here: https://github.com/ethz-asl/laser_slam/blob/6faea1c86f6b6828ee5dea7fa4efad5be4d7a90b/laser_slam_ros/src/laser_slam_worker.cpp#L176 maybe that can point you towards something!

Otherwise did you keep the same frame names for world_frame, odom_frame and sensor_frame as specified in here : https://github.com/ethz-asl/segmatch/blob/master/laser_mapper/launch/kitti/kitti_loop_closure.yaml#L2

From your video I realize that the segmentation positions might be not be well updated. Are you sure that the following function is properly called: https://github.com/ethz-asl/segmatch/blob/master/segmatch/src/segmatch.cpp#L501 This is also where the source map is being updated. Note that the segments are always expressed in the world_frame (we used "map" in the configuration).

vkee commented 6 years ago

That makes sense, it is confusing why everything jumps way off.

Do you think that this is what is messing up things? I will look into it.

I did not change any frame names.

I think so, I followed the behavior of processing a loop closure. After processing the localization, I call segmatchworker.update(trajectory) (https://github.com/vkee/segmatch/blob/merge_loc_corr/laser_mapper/src/laser_mapper.cpp#L99), which calls that function (https://github.com/vkee/segmatch/blob/master/segmatch_ros/src/segmatch_worker.cpp#L179).

xero0519 commented 6 years ago

Nice job of trajectory correction. Actually, I find that if using the accumulated transformation to correct the trajectory, the program goes to fine. I mean the unstable oscillation of trajectory correction.