Open rsasaki0109 opened 4 years ago
Pinging @koide3
@rsasaki0109 @kunaltyagi
Some users of my NDT package reported this issue a few days ago. It seems the MT line search code enabled by #4181 has a bug that yields NaN values. I'm digging into the cause and will open a PR to fix this.
@rsasaki0109 @kunaltyagi I found some bugs in voxel search and parallelization that make the hessian inaccurate and cause segfaults. Fortunately, the buggy codes only exist in my NDT package but not in PCL, and thus I suppose the latest NDT in PCL doesn't cause segfaults. However, in case you observe problems with the latest NDT in PCL, I would be appreciate if you could provide some sample data to reproduce the bug.
BTW, although I haven't seen critical problems, I still have a doubt about computeStepLengthMT(). Maybe it improperly clamps the line search range, leading to inaccurate registration. I'll carefully check the validity of that function.
Sorry. I'm also using ndt_omp, and I haven't checked to see if the pcl implementation causes problems.
I've encountered the bug about More Thuente method before (I couldn't fix it then).
Ndt is going to sandwich the step size a_t
between a_l
(lower limit) and a_u
(upper limit) in determining a_t
.
but, the implementation of ndt sometimes makes a_l
> a_u
in the first trialValueSelectionMT
.
I think this is the problem. I'm sorry if I'm wrong in my thought because I ran into the bug a long time ago.
updateIntervalMT method https://github.com/PointCloudLibrary/pcl/blob/master/registration/include/pcl/registration/impl/ndt.hpp#L483-L517 Where to use updateIntervalMT https://github.com/PointCloudLibrary/pcl/blob/master/registration/include/pcl/registration/impl/ndt.hpp#L727-L740
@rsasaki0109
Thanks for sharing your thoughts. I think the condition a_l > a_u
itself is OK because the algorithm assumes unordered a_l
and a_u
(just above Theorem 2.1 in the paper). But, I agree with you that the line search implementation may have some problems.
I suspect that clamping a_t
within [step_min, step_max] causes a invalid line search range that results in the condition a_l == a_t
when the search direction is not correct, and trialValueSelectionMT yields NaN at the following line. I guess this bug stems from improper hessian calculation in ndt_omp, and the NDT in PCL never meet this condition, but I'm not 100% sure at this moment.
Anyways, even if it doesn't cause segfault, I think the clamping affects the accuracy and should be corrected. https://github.com/PointCloudLibrary/pcl/blob/e6d032a1eed0115542ae9895679b75a5fca06eef/registration/include/pcl/registration/impl/ndt.hpp#L686 https://github.com/PointCloudLibrary/pcl/blob/e6d032a1eed0115542ae9895679b75a5fca06eef/registration/include/pcl/registration/impl/ndt.hpp#L530
@koide3 Oh, yes! I remember getting an error on that line when I parsed it, too (in ndt_omp). I see, the problem was with wrong Hessian's calculations...
double z = 3 * (f_t - f_l) / (a_t - a_l) - g_t - g_l;
Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.
Hi @koide3 @rsasaki0109 , are there any updates on this problem?
The following PR was merged last week.
[registration] Fix bug in NDT step interval convergence criteria https://github.com/PointCloudLibrary/pcl/pull/4181
This PR improves the accuracy of ndt, but there is a bug that occasionally kills the process.
hdl_graph_slam issues 131 https://github.com/koide3/hdl_graph_slam/issues/131
I have also ensured that this fix gives the same error as the above issue on ndt.