[segmentation] High-speed Euclidean clustering method on larger neighborhood

WangZP10969 commented 3 years ago

I am using PCL’s Euclidean clustering method, which is usually used to segment several dense point clouds with larger distances, but the PCL method takes longer to run on larger neighborhoods

I am trying a new method, first sampling the point cloud by a threshold distance, and then performing Euclidean clustering for the sampling points. The clustering condition first satisfies the case where the distance between the sampling points is less than 3 times the distance threshold, that is, the distance between the two sampling points is the farthest point in the neighborhood, and the farthest points distance is just the threshold. Then, the nearest distance is calculated for the neighboring points of these sampling points. If the nearest distance is less than the threshold, the sampling point is added to the cluster.

I tested it on a scripting language platform. The average speed on the large neighborhood is faster than the traditional method, but the coefficient on the small neighborhood is higher than the traditional method. I am trying to port to the PCL-based C++ algorithm. I hope to get some suggestions, such as Whether this method is valuable, and an efficient solution for finding the closest point of two point clouds.

The goal of Euclidean clustering is a connected domain with distance, that is, when the distance between two points is less than the threshold, they are considered to be in the same cluster

The method adopted by PCL is,

Mark all points as unprocessed
Take out an unprocessed point to generate a cluster
Find the points within the threshold distance and add them to the cluster
Select other points in the cluster and repeat step 3 until the cluster no longer increases
Loop to step 2 until all points are processed

This scheme is very simple and robust, but the time complexity is, the neighborhood calculation of all points Assuming that the number of points is N and the neighboring points are K, it is Nlog(N)K For a flat point cloud, the growth of K is the neighborhood radius R^2, that is, as the threshold increases, the time is a square-level increase, so the Euclidean clustering time under a large neighborhood is very long

Sampling process 1.1 Mark all points as unprocessed 1.2 Take out an unprocessed point and mark it as a sampling point 1.3 Find the points within the threshold distance, record the neighbor list corresponding to the sampling point, and mark these points as processed 1.4 Loop to step 1.2 until all points are processed,
Clustering process 2.1 Mark all sampling points as unprocessed 2.2 Take out an unprocessed sampling point to generate a cluster, and mark the neighbor point list corresponding to the current sampling point as the corresponding cluster label 2.3 Find sampling points within 3 times the threshold distance, and calculate the closest distance from these sampling points to the current sampling point. If it is within the threshold distance, add the sampling points to the cluster and mark the neighboring point list corresponding to the sampling points For the corresponding cluster label 2.4 Loop to step 2.2 until all sampling points are processed

In this method, assume that the number of points is N, the neighborhood point is K, and the sampling point neighborhood is T In the sampling process, the time complexity is N/Klog(N)K = Nlog(N) Clustering process, the sampling points are clustered as Klog(K)T, and the worst case of judging the nearest distance of the neighboring points of the sampling point is N/KKKT = NKT, which is N/K samples Point, each sampling point has been calculated KKT times to get the closest distance, but generally speaking, the result can be obtained at the KT level, that is, the usual complexity is N/K KT=NT

PCL method complexity, Nlog(N)K Sampling point method complexity, Nlog(N)+Klog(K)*T+(NKT~NT) T is the sampling point within 3 times the distance of the sampling point, which is approximately constant, generally much smaller than K Therefore, when K is large, the sampling point method has certain advantages

kunaltyagi commented 3 years ago

@mvieth Thoughts?

mvieth commented 3 years ago

Interesting idea! So your algorithm would give the same results as the current euclidean clustering method in PCL, but faster in certain conditions? One thing I don't understand: In step 2.3, why do you search for sampling points within 3 times the threshold distance?

I think there are basically three things to do:

You implement the new algorithm. The function should take the same parameters as the existing extractEuclideanClusters function. Then you would create a pull request with the new implementation
We would need tests to verify that the new algorithm works correctly and gives the same results as the current algorithm (the tests would be in PCL's gtest framework and would be placed here).
To test how much faster the new algorithm is, we would need benchmarks (in PCL's google benchmark framework, placed somewhere here). Ideally, we would benchmark several real-world point clouds, with realistic cluster tolerances. Then we would get a good picture whether users would benefit from the new algorithm. Benchmarks usually tell us more than pure complexity calculations.

Based on the results, we would decide whether the algorithm would fit into PCL, and in which way. Does this sound good to you?

WangZP10969 commented 3 years ago

@mvieth Thank you very much for your reply.

This method is inspired by the pyramid acceleration process, which can reduce the amount of calculation by clustering the sample points of the point cloud to represent all points.

However, the closest distance of the sampling point pair cannot be directly described by the distance of the sampling point to itself, because the neighborhood points around the sampling point are distributed in the sphere of the threshold distance radius, and the closest distance is the distance between the closest points of the two spheres.

Extracting the sampling point pair less than 3 times the threshold distance is to ensure that the nearest distance of the neighboring points of the two sampling points may be less than the threshold distance, because the sampling point pair greater than 3 times the threshold distance, the closest distance of all neighboring points must be greater than the threshold distance.

The sampling algorithm in the first step ensures that the distance between any two sampling points will not be less than the threshold, because the neighboring points of the sampling points will be marked first, and the subsequent sampling points cannot be generated from these neighboring points.

The blue point is the selected sampling point, a spherical neighborhood with a threshold radius, the red point in the neighborhood has been marked and cannot be used as a new sampling point candidate, so new sampling points can only be generated from green.

In the sphere neighborhood of two sampling points, the closest distance between the center of the sphere will not be less than the threshold distance. 1628071530(1)

and in the most extreme case, the center of the sphere is 3 times the radius, and the distance between neighboring points can be the threshold distance. 1628071327(1)

I am not sure about the performance of this method in PCL test cases, but I used the same method of defining functions in other algorithm platforms for comparison, and got the same results in several test data. I think this method is equivalent to Euclidean clustering in principle. The methods of other platforms are very fast, especially on large neighborhoods, I'm not sure how it is implemented, but certainly not all points have been searched for neighborhood.

I think this is a different implementation of European clustering, similar to the KdTree and OcTree included in Search. Perhaps we can add different functional interfaces to the European clustering class to call the difference method, such as sampling European clustering, or approximate sampling Euclidean clustering (do not calculate the closest distance, directly include the neighborhood of sampling points within 3 times the distance).

I used PCL's KdTree nearest neighbor search to initially complete this function. The speed of sampling points and clustering sampling points will become faster as the threshold distance increases, but the nearest distance to the neighboring points of the sampling point pairs takes a lot of time. , This still makes the cluster segmentation under a larger threshold distance very time-consuming.

If the closest distance of the sampling point pair is not considered, then this method will become an approximate Euclidean clustering segmentation, which can be used when some dense point clouds do have a large closest point distance.

I will try to use KdTree to speed up the search for the current sampling point, so that the worst time complexity can be further optimized The total number of points is N, the number of neighborhood points of the sampling point is K, and the number of sampling points in the neighborhood of 3 times the distance of the sampling point is T (usually 12-18) Linear distance calculation: the number of sampling points (N/K) the number of sampling points in the neighborhood (K) the number of sampling points in the neighborhood 3 times the distance of the sampling point (T) the number of sampling points in the neighborhood (K) = N/KK TK=NKT KDTree closest distance calculation: the number of sampling points (N/K) the number of search points in the neighborhood of the sampling point (log(K)) the number of sampling points at 3 times the distance of the sampling point (T) the number of sampling points in the neighborhood (K) =N/Klog(K)TK=Nlog(K)T

WangZP10969 commented 3 years ago

I tried to optimize the closest point search, and further reduced some candidate points through the projection distance.

1629100174(1)

I try to project all the points on the line of the two sampling points, and then find the point with the closest projection distance

Calculate the spatial distance of the pair of points, and then expand to this spatial distance on both sides of the projection distance to include all candidate points

For the test, I used two kinds of point clouds that I often deal with. One is that the plane surface has a lot of bumps, and the other is a few objects that are relatively close. The speed improvement effect is as follows. In the case of large neighborhoods, There is a certain reduction in running time

Time1 is the new method, and Time2 is the traditional method, Other platforms as a reference. 1629101847(1) 1629101802(1)

The test results show that there is a certain improvement in the large neighborhood of the traditional method, but there is still a big gap with other platform methods. I am trying to further optimize

The attachment is the test code and test data, any suggestions are very welcome ConnectionDistance3d.ply.txt ConnectionMap.ply.txt Test.cpp.txt

PointCloudLibrary / pcl

[segmentation] High-speed Euclidean clustering method on larger neighborhood #4875