PointCloudLibrary / pcl

Point Cloud Library (PCL)
https://pointclouds.org/
Other
9.97k stars 4.62k forks source link

Line search bug in ndt #4232

Open rsasaki0109 opened 4 years ago

rsasaki0109 commented 4 years ago

The following PR was merged last week.

[registration] Fix bug in NDT step interval convergence criteria https://github.com/PointCloudLibrary/pcl/pull/4181

This PR improves the accuracy of ndt, but there is a bug that occasionally kills the process.

hdl_graph_slam issues 131 https://github.com/koide3/hdl_graph_slam/issues/131

I have also ensured that this fix gives the same error as the above issue on ndt.

kunaltyagi commented 4 years ago

Pinging @koide3

koide3 commented 4 years ago

@rsasaki0109 @kunaltyagi

Some users of my NDT package reported this issue a few days ago. It seems the MT line search code enabled by #4181 has a bug that yields NaN values. I'm digging into the cause and will open a PR to fix this.

koide3 commented 4 years ago

@rsasaki0109 @kunaltyagi I found some bugs in voxel search and parallelization that make the hessian inaccurate and cause segfaults. Fortunately, the buggy codes only exist in my NDT package but not in PCL, and thus I suppose the latest NDT in PCL doesn't cause segfaults. However, in case you observe problems with the latest NDT in PCL, I would be appreciate if you could provide some sample data to reproduce the bug.

BTW, although I haven't seen critical problems, I still have a doubt about computeStepLengthMT(). Maybe it improperly clamps the line search range, leading to inaccurate registration. I'll carefully check the validity of that function.

rsasaki0109 commented 4 years ago

Sorry. I'm also using ndt_omp, and I haven't checked to see if the pcl implementation causes problems.

I've encountered the bug about More Thuente method before (I couldn't fix it then). Ndt is going to sandwich the step size a_t between a_l (lower limit) and a_u (upper limit) in determining a_t. but, the implementation of ndt sometimes makes a_l > a_u in the first trialValueSelectionMT. I think this is the problem. I'm sorry if I'm wrong in my thought because I ran into the bug a long time ago.

updateIntervalMT method https://github.com/PointCloudLibrary/pcl/blob/master/registration/include/pcl/registration/impl/ndt.hpp#L483-L517 Where to use updateIntervalMT https://github.com/PointCloudLibrary/pcl/blob/master/registration/include/pcl/registration/impl/ndt.hpp#L727-L740

koide3 commented 4 years ago

@rsasaki0109 Thanks for sharing your thoughts. I think the condition a_l > a_u itself is OK because the algorithm assumes unordered a_l and a_u (just above Theorem 2.1 in the paper). But, I agree with you that the line search implementation may have some problems.

I suspect that clamping a_t within [step_min, step_max] causes a invalid line search range that results in the condition a_l == a_t when the search direction is not correct, and trialValueSelectionMT yields NaN at the following line. I guess this bug stems from improper hessian calculation in ndt_omp, and the NDT in PCL never meet this condition, but I'm not 100% sure at this moment.

Anyways, even if it doesn't cause segfault, I think the clamping affects the accuracy and should be corrected. https://github.com/PointCloudLibrary/pcl/blob/e6d032a1eed0115542ae9895679b75a5fca06eef/registration/include/pcl/registration/impl/ndt.hpp#L686 https://github.com/PointCloudLibrary/pcl/blob/e6d032a1eed0115542ae9895679b75a5fca06eef/registration/include/pcl/registration/impl/ndt.hpp#L530

rsasaki0109 commented 4 years ago

@koide3 Oh, yes! I remember getting an error on that line when I parsed it, too (in ndt_omp). I see, the problem was with wrong Hessian's calculations...

double z = 3 * (f_t - f_l) / (a_t - a_l) - g_t - g_l; 
stale[bot] commented 4 years ago

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

hiyyg commented 1 year ago

Hi @koide3 @rsasaki0109 , are there any updates on this problem?