JuliaGeometry / KDTrees.jl

KDTrees for julia
Other
25 stars 8 forks source link

Knn Index #17

Open Bootstrap360 opened 9 years ago

Bootstrap360 commented 9 years ago

Hello,

Why does the knn function return the index of the nearest neighbour of the original data as opposed to the data field of the KDTree?

For example: the below code returns false. data = randn(3,100) tree = KDTree(data) indices, distances = knn(tree, data[:,3],1) tree.data[:, indices[1]] == data[:, indices[1]] # false

I know that if you request to not reorder the data, by passing the parameter "reorder=false", then the above code will return true.

It seems redundant to have to keep around your own data variable when it is already stored within the KDTree type.

I suggest an optional parameter on knn called indexof. If indexof == original, then it should return the original indices. if indexof == reordered, then it should return the indices of the k nearest neighbours of the data field.

KristofferC commented 9 years ago

Yes, you have a good point. Maybe we could have a copy flag to the KDTree constructor that would copy the input data before constructing the tree. If the copy flag is true we return the indices of the original data, else indices into the tree's data field.

KristofferC commented 9 years ago

My above comment didn't make so much sense because I forgot that I actually do copy the input data when reorder=true.

It would of course be simple to implement what you say but then the .data field has to be documented somehow or else the indexof parameter doesn't make sense.

If you want to do this right now you could do something like:

data = randn(3,100)
tree = KDTree(data)
indices, distances = knn(tree, data[:,3],1)

orig_to_tree = invperm(tree.indices)
tree_index = orig_to_tree[indices][1]
tree.data[:, tree_index] == data[:, indices[1]] # true

The tree.indices field says how the original data is permutated in the tree so the invperm call inverts this and you can use that to get the tree index and thus no longer need the original data.

Bootstrap360 commented 9 years ago

Oh great. I was using a map function to find the inverse of the indices function.

 invperm()

seems to do the trick.

I added the following function inside of kd_tree.jl

  function knn{T <: AbstractFloat}(tree::KDTree, point::Vector{T}, k::Int; full_rec_dim::Int = 6, indiciesof = "origional")
    if indiciesof == "origional"
      return knn(tree, point, k, full_rec_dim)
    elseif indiciesof == "KDTree.data"
      indices, distances = knn(tree, point, k, full_rec_dim)
      orig_to_tree = invperm(tree.indices)
      tree_indices = orig_to_tree[indices]
      return (tree_indices, distances)
    else
      error("Possible values for parameter indiciesof include; origional or KDTree.data")
    end

  end

I changed full_rec_dim from being a flag to a keyword. I am not so sure I like the name of the keyword "indiciesof". I also am not sure taht "origional" and "KDTree.data" are the correct choices either.

With regards to documentation, have you seen how julia v0.4 does documentation?

I would suggest updating all your documentation to follow this guide: http://docs.julialang.org/en/release-0.4/manual/documentation/

Bootstrap360 commented 9 years ago

I have branched my git version but I do not know how to push and merge back.

I know I need submit a merge request but not sure how to do so.

Bootstrap360 commented 9 years ago

Final function

"""
`function knn{T <: AbstractFloat}(tree::KDTree, point::Vector{T}, k::Int; full_rec_dim::Int = 6, indiciesof = "origional")`

Finds the index and distance of k nearest neighbors of the given KDTree ('tree') to a given point ('point').

The index can either be relative to the data used to construct the tree or relative to the internal data structure.

This is changed by setting the indicesof parameter to either "origional" or"KDTree.data"

"""

function knn{T <: AbstractFloat}(tree::KDTree, point::Vector{T}, k::Int; full_rec_dim::Int = 6, indicesof = "origional")
  if indicesof == "origional"
    return knn(tree, point, k, full_rec_dim)
  elseif indicesof == "KDTree.data"
    indices, distances = knn(tree, point, k, full_rec_dim)
    orig_to_tree = invperm(tree.indices)
    tree_indices = orig_to_tree[indices]
    return (tree_indices, distances)
  else
    error("Possible values for parameter indicesof include; origional or KDTree.data")
  end

end
KristofferC commented 9 years ago

I think it would be easier to just have an if statement around here: https://github.com/JuliaGeometry/KDTrees.jl/blob/1a307e8e178b7afb6169f49c82a7072fc8e0e053/src/kd_tree.jl#L382

And yes, I know about julias way of documenting stuff and I agree it would be good to write some proper docstrings for the exported functions. I will do it in hopefully not too long time,