ericreg / Supercluster.KDTree

An extremely fast, easy to use, KD-Tree optimized for machine learning and big-data applications.
MIT License
98 stars 32 forks source link

RadialSearch not returning correct number of neighbors #4

Open PRoder1 opened 5 years ago

PRoder1 commented 5 years ago

Thanks for this code Eric The public method RadialSearch returns multiple neighbors when I pass the parameter as 1. This method is not using the neighbors parameter properly, and is not returning the specified number of neighbors. Looking at the code, it does an if test on this number, however the code is identical in the two if blocks and the number is not used anywhere.

Here is the code, note that the code is identical in the if statement blocks: ///

/// Searches for the closest points in a hyper-sphere around the given center. /// /// The center of the hyper-sphere /// The radius of the hyper-sphere /// The number of neighbors to return. /// The specified number of closest points in the hyper-sphere public Tuple<TDimension[], TNode>[] RadialSearch(TDimension[] center, double radius, int neighboors = -1) { var nearestNeighbors = new BoundedPriorityList<int, double>(this.Count); if (neighboors == -1) { this.SearchForNearestNeighbors( 0, center, HyperRect.Infinite(this.Dimensions, this.MaxValue, this.MinValue), 0, nearestNeighbors, radius); } else { this.SearchForNearestNeighbors( 0, center, HyperRect.Infinite(this.Dimensions, this.MaxValue, this.MinValue), 0, nearestNeighbors, radius); }

        return nearestNeighbors.ToResultSet(this);
    }
ericreg commented 5 years ago

Wow great catch, I'll take a look at this.

naskew commented 4 years ago

Hi,

I had been hoping to use this but we found that the performance seemed to become very poor when we put it into the acceptance environment. It might be that we are simply not using it for the purpose it was designed so a quick summary.

We have a cloud of points in 3D space (so K=3) and we add them. All we need to know is if a point is within a radius of any point of the cloud. So we call RadialSearch with an origin of the sought point and passing radius squared (another bug that the parameter is called redius and yet has to be radius squared) and a neighboors (sic) of 1.

We anticipated that passing in a count of 1 would return any point within the radius. Obviously this was our assumption and is wrong, it seems to ignore this parameter entirely and the implementation of the RadialSearch uses the neighboors parameter to switch between if ... and else ... but both ...'s are identical code (as Proder1 remarked).

However our biggest issue is that even for a cloud of only 800 points, this code is way outperformed by a simple linear search. I guess the purpose of this was to find any points within the radius, we just need a search for the first point, even if it is not the nearest.

I'll dig a little deeper and see if we can do something to get the performance where we need it to be.

Cheers, Nick

naskew commented 4 years ago

I should explain what I mean by the performance of a linear search is better. Remember we only want to know the existence of a point. If we query a point that is outside the cloud entirely then the KDTree is way quicker than the linear search because the linear search has to search all points.

However once I started searching for points that are within he cloud, it seems that the KDTree is far far slower than the linear search. Now I realise it is a slightly unfair comparison because you are finding me the closest within a radius and my liner search simply accepts the first match. However if I specify a parameter asking the search to return me 1 point then I'd expect KDTree to be faster than a linear search as that is the purpose of the index. I mean I don't actually care if it is the closest but even if you were returning the closes I'd expect you to be quicker.

This suggests that either I've done something wrong in my usage (profiling shows I'm setting up the index once and then hitting it millions of times causing SearchForNearestNeighbours to be called recursively and fanning out) or perhaps because of the relative size of the radius supplied, too many points are being considered and ordered.

I am sure that give that the number of neighbours is being ignored, that this has something to do with it but when I created a version that uses NearestNeighbours but with a restricted max radius, it too was slow.

Ah well another day of digging :-)

johnetc commented 3 years ago

Just in case anyone comes across it in the future..

///

/// Searches for the closest points in a hyper-sphere around the given center. /// /// The center of the hyper-sphere /// The radius of the hyper-sphere /// The number of neighbors to return. /// The specified number of closest points in the hyper-sphere public Tuple<TDimension[], TNode>[] RadialSearch(TDimension[] center, float radius, int neighboors = -1) { var nearestNeighbors = new BoundedPriorityList<int, float>(neighboors < 1? this.Count : neighboors); if (neighboors == -1) { this.SearchForNearestNeighbors( 0, center, HyperRect.Infinite(this.Dimensions, this.MaxValue, this.MinValue), 0, nearestNeighbors, radiusradius); } else { this.SearchForNearestNeighbors( 0, center, HyperRect.Infinite(this.Dimensions, this.MaxValue, this.MinValue), 0, nearestNeighbors, radiusradius); }

        return nearestNeighbors.ToResultSet(this);
    }