PointCloudLibrary / pcl

Point Cloud Library (PCL)
https://pointclouds.org/
Other
9.64k stars 4.59k forks source link

[BUG FIX] getCorrespondenceScore must not go out-of-bounds. #6070

Open fghoussen opened 6 days ago

fghoussen commented 6 days ago

[BUG FIX] getCorrespondenceScore must not go out-of-bounds.

I have a quite systematic crash when using pcl::registration::CorrespondenceRejectorMedianDistance or pcl::registration::CorrespondenceRejectorSurfaceNormal.

I use it like so:

   std::unique_ptr<pcl::IterativeClosestPoint<PointNT, PointNT, double>> icp;
   icp = std::make_unique<pcl::IterativeClosestPoint<PointNT, PointNT, double>>();

   pcl::registration::CorrespondenceRejectorMedianDistance::Ptr rejMedDist;
   rejMedDist.reset (new pcl::registration::CorrespondenceRejectorMedianDistance);
   rejMedDist->setInputTarget<PointNT> (cloud_mst_normal);
   rejMedDist->setInputSource<PointNT> (cloud_initial_slv_normal);
   rejMedDist->setSearchMethodTarget<PointNT> (tree_mst_normal);
   rejMedDist->setMedianFactor (medianDistanceFactor);
   icp->addCorrespondenceRejector (rejMedDist);

If any mistake (explaining crashes ?) is obvious, I'd be glad to know. In any cases, the attached patch seem to make things better (less often crash) but does not solve the problem (crash occurs on a regular basis).

The patch is basically "each time I access a member, check the index is consistent with the member size, or, return default value (here maximum value to say the score is as bad as possible AFAIU the code)"

I pushed this to let the PCL community decide if this patch is worth being merged or not.

fghoussen commented 6 days ago

Forgot to mention, at my side, on several different point clouds as inputs, even with this fix, crashes occurs around here: https://github.com/fghoussen/pcl/blob/d257a89130386c3356fba8ba0f04e8e2089704f3/registration/src/correspondence_rejection_median_distance.cpp#L59

Not sure how this is safe when distance are max'ed by bad correspondence score

mvieth commented 6 days ago

CorrespondenceRejectorMedianDistance is successfully tested here: https://github.com/PointCloudLibrary/pcl/blob/master/test/registration/test_registration.cpp#L262 source, target, etc are set within ICP (for the source this is strictly necessary since it changes from iteration to iteration).

In any cases, the attached patch seem to make things better (less often crash) but does not solve the problem (crash occurs on a regular basis).

If you say that it does not really solve the problem, I would rather investigate how the crash actually happens, and then create a fix based on that. Can you provide a full program that I can use to easily reproduce the crash? Including source and target cloud, e.g. from the pcl test directory,

fghoussen commented 6 days ago

With my data, following examples you mentioned, I get crash on a regular basis. This patch (with added commits) seems to make thing a lot-lot more stable (does not crash anymore after 15+ successive try): this is only checking bounds before accessing elements in lists.

Can you provide a full program that I can use to easily reproduce the crash? Including source and target cloud

Unfortunately not that easy: I can't provide data (not mine - need to check if I can share them) and they are big files (not great to commit in repo IMO), the original code can not be compiled in full debug (= difficult to analyse crash as you do not get all symbols)

mvieth commented 4 days ago

With my data, following examples you mentioned, I get crash on a regular basis. This patch (with added commits) seems to make thing a lot-lot more stable (does not crash anymore after 15+ successive try): this is only checking bounds before accessing elements in lists.

I don't think it is a good idea to merge a change if the bug is not really understood. This seems to treat the symptoms, but potentially does not solve the bug itself. In the worst case, these checks could hide a more severe bug somewhere else in the code.

Can you provide a full program that I can use to easily reproduce the crash? Including source and target cloud

Unfortunately not that easy: I can't provide data (not mine - need to check if I can share them) and they are big files (not great to commit in repo IMO), the original code can not be compiled in full debug (= difficult to analyse crash as you do not get all symbols)

In case you can't share the data, you could try to use a point cloud from the pcl test directory ( https://github.com/PointCloudLibrary/pcl/tree/master/test ) or from the pcl data repo ( https://github.com/PointCloudLibrary/data ). How many points do the point clouds have that you use? I am asking because if it is a certain number (several millions or even billions), this might point to a specific kind of bug. Since CorrespondenceRejectorMedianDistance seems to be used successfully in test_registration.cpp, I think the main question is: what is causing the bug for you? Is it the data you are using? The parameters/settings you are using? Something else? I can only help investigating the bug if you help me reproduce it. If you could share a more complete program to reproduce the bug (starting from loading the point clouds from files to calling align, including all parameters you set), that would already be very helpful.

fghoussen commented 4 days ago

I'll see what I can do... Look closely at the patch (hidding whitespaces): this is only bound checking

I don't think it is a good idea to merge a change if the bug is not really understood.

I understand your point of view. But the problem is now understood: after bound checking, the code is stable. I will likely not be able to go deeper in the analysis: gdb reported index = 26879 and (*input_).size() = 15698 on my data... And I have no idea why?!.. But a least, with this patch, the code doesn't crash anymore

In case you can't share the data

Not sure. Need to check. This will be time consuming too...

what is causing the bug for you? Is it the data you are using?

The problem seems to lie in the data (= point cloud). Reducing point cloud to minimal setting is impossible (I do not know what trigger the problem in the PCL code) and modifying point cloud will be time consuming too (possibly making the problem disappear).

you could share a more complete program to reproduce the bug

Once again, not sure I'll have time for this: I already spent more time than expected. I'll see what I can do...

fghoussen commented 4 days ago

Adding a unit test with data from https://github.com/PointCloudLibrary/pcl/tree/master/test may or may not reproduce the problem. Could you add one on top of this PR? If not, I'll see if I have time... But I can't guarantee

fghoussen commented 4 days ago

The problem is maybe silent in the current test suite: how to run the test suite? cmake -DBUILD_benchmarks=ON -DBUILD_examples=ON -DBUILD_simulation=ON -DBUILD_global_tests=ON ..; make all testdoes not trigger the test suite

mvieth commented 4 days ago

The problem is maybe silent in the current test suite: how to run the test suite? cmake -DBUILD_benchmarks=ON -DBUILD_examples=ON -DBUILD_simulation=ON -DBUILD_global_tests=ON ..; make all testdoes not trigger the test suite

cmake --build . -- tests or make tests

fghoussen commented 3 days ago

The problem is maybe silent in the current test suite

I've just tested: the test suite doesn't see the problem which confirm it seems related to the point clouds used as input