Open kskyten opened 7 years ago
Yes it should be documented better (I didn't add that functionality hehe). skip
takes one argument which is the index of a point. If the function returns true, the point is skipped. Maybe https://github.com/KristofferC/NearestNeighbors.jl/blob/master/src/tree_ops.jl#L101 can help you see how it is used.
Seems to me like the easiest way would be to pass tree.data[inx]
and point
to the skip function.
Yeah, passing the point itself as an argument makes sense. Hmm, not sure it can be done on a backwards compatible way.
How about checking if the distance is zero? Testing equality of the points seems more robust though.
It would be up to the user to define how equality should be checked by constructing the skip
function. The package only need to provide the necessary data to that function.
Adding arguments to the skip function works for omitting the equal points. However, now a wrong number of points are returned and I'm not sure where to fix this.
What I actually needed was the distances to the k-th nearest neighbours. Here's what I ended up with
k = 3
X = rand(3, 10);
tree = KDTree(X)
ind, dist = knn(tree, X, k+1, true)
kdist = [x[end] for x in dist]
I couldn't figure out how to make a consistent API for the skip function though.
Bumping here, although this could probably be an issue on its own regarding the skip function. I could not find a way to exclude the current point being evaluated when calling the function over multiple points. For example
M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)
j = 10
_, dists = nn(tree, M[:, j], i -> begin println(i); return false end )
will print out all column indexes, which means that invoking the same function for multiple points will make it impossible to know which point is currently being evaluated.
_, dists = nn(tree, M, i -> ???)
and
M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)
_, dists = nn(tree, M)
in this case dists == zeros(n)
since every point recognizes himself as the closest one, not very helpful.
The only workaround I found was
M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)
dists = zeros(n)
for j in 1:n
_ , dist = nn(tree, M[:, j], i -> i == j)
dists[j] = dist
end
which makes the multiple points API a bit useless, and requires extra allocations/manipulation by the user to get the same results. So maybe having the skip function pass over also the current index/point being processed when considering multiple points is a good first solution, maybe for 1.0 if it's breaking?
I would like to get the
k
nearest points that are not the querying points when using the same data for building the tree and querying it. Is it possible to do this with theskip
function? Theskip
function seems to take integer arguments corresponding to tree indices, but it is unclear to me what this means.