KristofferC / NearestNeighbors.jl

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.
Other
413 stars 65 forks source link

bump the default leafsize from 10 to 25 #198

Closed KristofferC closed 2 months ago

KristofferC commented 2 months ago

Since this package was written CPUs have in general become faster (at a rate faster than memory can be fetched). The choice of a good default leafsize might therefore be worth reevaluating. Remember that with a large leafsize we have to traverse fewer nodes in the tree while we have to explicitly check more points for the distance. As CPUs gets faster compared to memory it is generally expected that leafsize can be increased.

The checks / benchmarks below are artificial but in lack of real benchmark data it is the best I have.

First, let's look at the number of flops required for a leafsize of 10 vs 25 (here using a BallTree):

julia> using StaticArrays, NearestNeighbors, StableRNGs

julia> using NearestNeighbors: HyperSphere

julia> V = SVector{3, Float64}

julia> data = rand(StableRNG(1), 3, 10^4);

julia> r = 0.01

julia> v = rand(StableRNG(2), 3);

julia> ball = HyperSphere(convert(V, v), convert(eltype(V), r));

julia> using GFlops

julia> x = BallTree(data; leafsize=10); @count_ops NearestNeighbors.inrange_kernel!(x, 1, v, ball, Int[])
Flop Counter: 1464 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │     438 │
│ sub │     468 │
│ mul │     558 │
└─────┴─────────┘

julia> x = BallTree(data; leafsize=25); @count_ops NearestNeighbors.inrange_kernel!(x, 1, v, ball, Int[])
Flop Counter: 1475 flop
┌─────┬─────────┐
│     │ Float64 │
├─────┼─────────┤
│ add │     434 │
│ sub │     484 │
│ mul │     557 │
└─────┴─────────┘

So the flops are roughly equal. However, instead of traversing through a bunch of nodes in the leafsize=10 case we instead just blast through and evaluate points (which if the tree is reordered are all laying next to each other in memory):

julia> vs = rand(StableRNG(2), data, 3, 10^3);

julia> x = BallTree(data; leafsize=10); @btime inrange(x, vs, 0.05);
  1.237 ms (2074 allocations: 173.05 KiB)

julia> x = BallTree(data; leafsize=25); @btime inrange(x, vs, 0.05);
  975.084 μs (2074 allocations: 173.05 KiB)

The memory occupied by a tree with a larger leaf size is also smaller:

julia> x = BallTree(data; leafsize=25);

julia> Base.summarysize(x) - sizeof(x.data)
105792

julia> x = BallTree(data; leafsize=10);

julia> Base.summarysize(x) - sizeof(x.data)
144192

Also, let's spot check a benchmark for KDTree:

julia> x = KDTree(data; leafsize=10); @btime inrange(x, vs, 0.05);
  532.292 μs (2074 allocations: 173.11 KiB)

julia> x = KDTree(data; leafsize=25); @btime inrange(x, vs, 0.05);
  470.084 μs (2074 allocations: 173.11 KiB)

where the difference is smaller albeit still noticeable.