Omitting results to points themselves

kskyten commented 7 years ago

I would like to get the k nearest points that are not the querying points when using the same data for building the tree and querying it. Is it possible to do this with the skip function? The skip function seems to take integer arguments corresponding to tree indices, but it is unclear to me what this means.

KristofferC commented 7 years ago

Yes it should be documented better (I didn't add that functionality hehe). skip takes one argument which is the index of a point. If the function returns true, the point is skipped. Maybe https://github.com/KristofferC/NearestNeighbors.jl/blob/master/src/tree_ops.jl#L101 can help you see how it is used.

kskyten commented 7 years ago

Seems to me like the easiest way would be to pass tree.data[inx] and point to the skip function.

KristofferC commented 7 years ago

Yeah, passing the point itself as an argument makes sense. Hmm, not sure it can be done on a backwards compatible way.

kskyten commented 7 years ago

How about checking if the distance is zero? Testing equality of the points seems more robust though.

KristofferC commented 7 years ago

It would be up to the user to define how equality should be checked by constructing the skip function. The package only need to provide the necessary data to that function.

kskyten commented 7 years ago

Adding arguments to the skip function works for omitting the equal points. However, now a wrong number of points are returned and I'm not sure where to fix this.

kskyten commented 7 years ago

What I actually needed was the distances to the k-th nearest neighbours. Here's what I ended up with

k = 3
X = rand(3, 10);
tree = KDTree(X)
ind, dist = knn(tree, X, k+1, true)
kdist = [x[end] for x in dist]

I couldn't figure out how to make a consistent API for the skip function though.

ghyatzo commented 4 months ago

Bumping here, although this could probably be an issue on its own regarding the skip function. I could not find a way to exclude the current point being evaluated when calling the function over multiple points. For example

M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)
j = 10
_, dists = nn(tree, M[:, j], i -> begin println(i); return false end )

will print out all column indexes, which means that invoking the same function for multiple points will make it impossible to know which point is currently being evaluated.

_, dists = nn(tree, M, i -> ???)

and

M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)
_, dists = nn(tree, M)

in this case dists == zeros(n) since every point recognizes himself as the closest one, not very helpful. The only workaround I found was

M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)

dists = zeros(n)
for j in 1:n
  _ , dist = nn(tree, M[:, j], i -> i == j) 
  dists[j] = dist
end

which makes the multiple points API a bit useless, and requires extra allocations/manipulation by the user to get the same results. So maybe having the skip function pass over also the current index/point being processed when considering multiple points is a good first solution, maybe for 1.0 if it's breaking?

KristofferC / NearestNeighbors.jl

Omitting results to points themselves #49