KristofferC / NearestNeighbors.jl

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.
Other
413 stars 65 forks source link

error with points as vector of points? #85

Open briochemc opened 5 years ago

briochemc commented 5 years ago

Great package! So fast! 😃 Anyway, I thought this MWE (based on the readme example except for using points instead of point) would work (since "points can also be a vector of other vectors where each element in the outer vector is considered a point.") but it throws an error:

julia> using NearestNeighbors

julia> data = rand(3, 10^4);

julia> k = 3 ;

julia> kdtree = KDTree(data) ;

julia> points = [rand(3) for i in 1:10]
10-element Array{Array{Float64,1},1}:
 [0.68304, 0.983004, 0.220862]
 [0.626807, 0.61079, 0.968843]
 [0.192098, 0.157959, 0.300556]
 [0.300309, 0.502549, 0.653381]
 [0.160819, 0.653002, 0.9979]
 [0.809764, 0.0521815, 0.0290807]
 [0.20466, 0.592894, 0.656526]
 [0.370667, 0.0579244, 0.930079]
 [0.593999, 0.934687, 0.555133]
 [0.60504, 0.78062, 0.243998]

julia> idxs, dists = knn(kdtree, points, k, true)
ERROR: MethodError: no method matching length(::Type{Array{Float64,1}})
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:561
  length(::Base.MethodList) at reflection.jl:801
  length(::Core.MethodTable) at reflection.jl:875
  ...
Stacktrace:
 [1] check_input(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/NearestNeighbors.jl:29
 [2] knn(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}, ::Int64, ::Bool, ::Function) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/knn.jl:16 (repeats 2 times)
 [3] top-level scope at none:0

Maybe I'm doing this wrong?

dkarrasch commented 5 years ago

You could use StaticArrays

using StaticArrays, NearestNeighbors
data = rand(3, 10^4);
k = 3;
kdtree = KDTree(data);
points = [@SVector rand(3) for i in 1:10]
idxs, dists = knn(kdtree, points, k, true)

I think this should be reflected in the README, or otherwise there should be a silent attempt internally to turn a vector of vectors into a vector of static vectors. The overall reason for this requirement is that in a vector of vectors the elements may have different lengths, in contrast to points being a matrix, where each column must have the same length. Differing lengths, however, cannot be inferred from the AbstractVector type, and checking it would probably add to the runtime, I'm not sure, however, by how much.

briochemc commented 5 years ago

Oh I see, that makes sense. (However, I have been using a 3xn array instead for large n.)

I have two questions then:

dkarrasch commented 5 years ago

@KristofferC may help out if I'm saying something wrong, as I haven't written or modified any parts of the code, just scanned quickly through it some time ago. If I am not mistaken, it is always better to have things in the vector of SVector format, because internally, that's what 2D arrays are converted to anyway.

https://github.com/KristofferC/NearestNeighbors.jl/blob/faf05fb2652780d27ce50a5504dc26f0bfcd3311/src/kd_tree.jl#L69-L85

The reason likely is that once you have things in SVector format, the length of vectors and hence the dimension of the space is known to the compiler, which may help performance. But you could check and benchmark, and report here. I'd be interested as well.

KristofferC commented 5 years ago

The key point in the docs is:

or it can be a Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) is defined

So you should be able to use MVector as well which is mutable. Sure, we could convert an Array of Vectors and assert that all vectors have the same length.

briochemc commented 5 years ago

Oh I see now. I got confused by the fact that eltype(V) and length(V) are not the same as eltype(v::V) and length(v::V)... May I suggest adding "(e.g., V = SVector{...})" and an example to the docs with SVectors?

mkborregaard commented 4 years ago

or it can be a Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) is defined

But that should be true for a vector of vectors, right? Yet I get

julia> a = [rand(1:10, 2) for i in 1:5]
5-element Array{Array{Int64,1},1}:
 [1, 9]
 [6, 3]
 [2, 7]
 [7, 3]
 [9, 7]

julia> KDTree(a)
ERROR: MethodError: no method matching length(::Type{Array{Int64,1}})
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:593
  length(::Base.MethodList) at reflection.jl:832
  length(::Core.MethodTable) at reflection.jl:906
  ...
Stacktrace:
 [1] NearestNeighbors.TreeData(::Array{Array{Int64,1},1}, ::Int64) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/tree_data.jl:14
 [2] #KDTree#16(::Int64, ::Bool, ::Bool, ::Array{Array{Int64,1},1}, ::Type{KDTree}, ::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:35
 [3] KDTree(::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:33 (repeats 2 times)
 [4] top-level scope at none:0
KristofferC commented 4 years ago

But that should be true for a vector of vectors, right?

No? The error message told you that it wasn't.

mkborregaard commented 4 years ago

Well. In that case I don't understand what's meant by that part of the docs.

mkborregaard commented 4 years ago

Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.

KristofferC commented 4 years ago

Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V)

For a Vector{Vector{Float64}}, V is Vector{Float64} and length(V) (which is length(Vector{Float64})) is not defined and is what the error message complains about. If instead you had a Vector{SVector{3, Float64}} then V is SVector{3, Float64} and length(V) is 3.

Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.

A Vector{Vector{Float64}} is an inefficent way of storing a point cloud and users calling NearestNeighbors with such a data structure would have bad performance and blame the library. It is too general of a data structure since the points in such a data structure can have different dimensions (we now need to add extra code to do boundschecks) but the library requires that all points have the same dimension.

mkborregaard commented 4 years ago

Yeah OK I see, the length needs to be part of the type.