KristofferC / NearestNeighbors.jl

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.
Other
413 stars 65 forks source link

Is there a reason sqeuclidean distance is not supported? #151

Closed nraynaud closed 2 years ago

nraynaud commented 2 years ago

Hi, I'm a little bit out of my element in maths, but I think sqeuclidean distances should work as well as euclidean for those data structures. Is there a reason it's not supported?

Broadening a bit the question, the Distances.jl package has defined some types for the various mathematical predicates that can be expected from the various metrics it exposes, is there a reason not to use one of those types to limit what's accessible to BallTree and KDTree?

thank you for your nice library.

dkarrasch commented 2 years ago

The restriction on the metrics for KDTree is of algorithmic type and has nothing to do with the given implementation here. But BallTree and BruteTree should work with pretty much any metric from Distances.jl.

nraynaud commented 2 years ago

Thank you for your answer, I re-read the code and I agree that it should work. I had read too fast the time prior, you made a special case for some metrics in the hypersphere, but there is another code path.

when I change runtest.jl to run the BallTree with the SqEuclidean metric, I get an error:

using NearestNeighbors
using StaticArrays

using Test
using LinearAlgebra

using Distances: Distances, Metric, evaluate, PeriodicEuclidean, SqEuclidean # CHANGED HERE
struct CustomMetric1 <: Metric end
Distances.evaluate(::CustomMetric1, a::AbstractVector, b::AbstractVector) = maximum(abs.(a .- b))
function NearestNeighbors.interpolate(::CustomMetric1,
                                      a::V,
                                      b::V,
                                      x,
                                      d,
                                      ab) where {V <: AbstractVector}
    idx = (abs.(b .- a) .>= d - x)
    c = copy(Array(a))
    c[idx] = (1 - x / d) * a[idx] + (x / d) * b[idx]
    return c, true
end
struct CustomMetric2 <: Metric end
Distances.evaluate(::CustomMetric2, a::AbstractVector, b::AbstractVector) = norm(a - b) / (norm(a) + norm(b))

# TODO: Cityblock()
const metrics = [SqEuclidean()] # CHANGED HERE
const fullmetrics = metrics # CHANGED HERE
const trees = [BallTree] # CHANGED HERE
const trees_with_brute = trees # CHANGED HERE

include("test_knn.jl")
include("test_inrange.jl")
include("test_monkey.jl")
include("test_datafreetree.jl")

@testset "periodic euclidean" begin
    pred = PeriodicEuclidean([Inf, 2.5])
    l = [0.0 0.0; 0.0 2.5]
    S = BallTree(l, pred)
    @test inrange(S,[0.0,0.0], 1e-2, true) == [1, 2]
end

error message:

tree type: Error During Test at /Users/nraynaud/.julia/dev/NearestNeighbors/test/test_knn.jl:7
  Got exception outside of a @test
  MethodError: no method matching BallTree(::Matrix{Float64}, ::SqEuclidean; leafsize=2)
  Closest candidates are:
    BallTree(::AbstractVecOrMat{T}) where T<:AbstractFloat at ~/.julia/packages/NearestNeighbors/VZzTb/src/ball_tree.jl:84 got unsupported keyword argument "leafsize"
    BallTree(::AbstractVecOrMat{T}, ::M; leafsize, storedata, reorder, reorderbuffer) where {T<:AbstractFloat, M<:Metric} at ~/.julia/packages/NearestNeighbors/VZzTb/src/ball_tree.jl:84
  Stacktrace:
    [1] (::var"#test#1"{SqEuclidean, UnionAll})(data::Matrix{Float64})
      @ Main ~/.julia/dev/NearestNeighbors/test/test_knn.jl:9
    [2] macro expansion
      @ ~/.julia/dev/NearestNeighbors/test/test_knn.jl:64 [inlined]

is that expected?

Edit: for context I'm new to julia, so I may completely misread the situation.

dkarrasch commented 2 years ago

Sorry, now I was too quick. From the error message, we learn that BallTree requires a Metric, most likely because it assumes validity of the triangle inequality to build the ball tree. As for the kdtree, yet another, additional assumption is required ("Minkowski metric").

I don't think there is oversight here, this package is very carefully written. It may just be that there is sparse documentation with regard to the underlying assumptions, but I haven't looked at docs and code for a while.

nraynaud commented 2 years ago

I see, I was under the impression that everything the euclidean distance did could be done by the square euclidean distance. I was under the impression that the square root function being monotonous meant that it wouldn't mess with distance comparisons.

Thank you for your time.