Open Bootstrap360 opened 9 years ago
Yes, you have a good point. Maybe we could have a copy
flag to the KDTree
constructor that would copy the input data before constructing the tree. If the copy
flag is true we return the indices of the original data, else indices into the tree's data field.
My above comment didn't make so much sense because I forgot that I actually do copy the input data when reorder=true
.
It would of course be simple to implement what you say but then the .data
field has to be documented somehow or else the indexof
parameter doesn't make sense.
If you want to do this right now you could do something like:
data = randn(3,100)
tree = KDTree(data)
indices, distances = knn(tree, data[:,3],1)
orig_to_tree = invperm(tree.indices)
tree_index = orig_to_tree[indices][1]
tree.data[:, tree_index] == data[:, indices[1]] # true
The tree.indices
field says how the original data is permutated in the tree so the invperm
call inverts this and you can use that to get the tree index and thus no longer need the original data.
Oh great. I was using a map function to find the inverse of the indices function.
invperm()
seems to do the trick.
I added the following function inside of kd_tree.jl
function knn{T <: AbstractFloat}(tree::KDTree, point::Vector{T}, k::Int; full_rec_dim::Int = 6, indiciesof = "origional")
if indiciesof == "origional"
return knn(tree, point, k, full_rec_dim)
elseif indiciesof == "KDTree.data"
indices, distances = knn(tree, point, k, full_rec_dim)
orig_to_tree = invperm(tree.indices)
tree_indices = orig_to_tree[indices]
return (tree_indices, distances)
else
error("Possible values for parameter indiciesof include; origional or KDTree.data")
end
end
I changed full_rec_dim from being a flag to a keyword. I am not so sure I like the name of the keyword "indiciesof". I also am not sure taht "origional" and "KDTree.data" are the correct choices either.
With regards to documentation, have you seen how julia v0.4 does documentation?
I would suggest updating all your documentation to follow this guide: http://docs.julialang.org/en/release-0.4/manual/documentation/
I have branched my git version but I do not know how to push and merge back.
I know I need submit a merge request but not sure how to do so.
Final function
"""
`function knn{T <: AbstractFloat}(tree::KDTree, point::Vector{T}, k::Int; full_rec_dim::Int = 6, indiciesof = "origional")`
Finds the index and distance of k nearest neighbors of the given KDTree ('tree') to a given point ('point').
The index can either be relative to the data used to construct the tree or relative to the internal data structure.
This is changed by setting the indicesof parameter to either "origional" or"KDTree.data"
"""
function knn{T <: AbstractFloat}(tree::KDTree, point::Vector{T}, k::Int; full_rec_dim::Int = 6, indicesof = "origional")
if indicesof == "origional"
return knn(tree, point, k, full_rec_dim)
elseif indicesof == "KDTree.data"
indices, distances = knn(tree, point, k, full_rec_dim)
orig_to_tree = invperm(tree.indices)
tree_indices = orig_to_tree[indices]
return (tree_indices, distances)
else
error("Possible values for parameter indicesof include; origional or KDTree.data")
end
end
I think it would be easier to just have an if statement around here: https://github.com/JuliaGeometry/KDTrees.jl/blob/1a307e8e178b7afb6169f49c82a7072fc8e0e053/src/kd_tree.jl#L382
And yes, I know about julias way of documenting stuff and I agree it would be good to write some proper docstrings for the exported functions. I will do it in hopefully not too long time,
Hello,
Why does the knn function return the index of the nearest neighbour of the original data as opposed to the data field of the KDTree?
For example: the below code returns false. data = randn(3,100) tree = KDTree(data) indices, distances = knn(tree, data[:,3],1) tree.data[:, indices[1]] == data[:, indices[1]] # false
I know that if you request to not reorder the data, by passing the parameter "reorder=false", then the above code will return true.
It seems redundant to have to keep around your own data variable when it is already stored within the KDTree type.
I suggest an optional parameter on knn called indexof. If indexof == original, then it should return the original indices. if indexof == reordered, then it should return the indices of the k nearest neighbours of the data field.