KristofferC / NearestNeighbors.jl

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.
Other
426 stars 66 forks source link

Implement threading. #5

Open KristofferC opened 9 years ago

KristofferC commented 9 years ago

I started implementing some threading support using the threading branch in master julia. The linked gist below implements threaded kd tree creations. Code is non polished.

https://gist.github.com/KristofferC/15beed394361a325ff33

The speedup is about 2x using 4 cores.

include("thread_tree.jl")
data = rand(5, 3*10^6)
@time KDTree(data);
#  1.009976 seconds (295.92 k allocations: 36.561 MB)
@time NearestNeighbors.KDTree(data; reorder=false);
#  1.963014 seconds (300.02 k allocations: 36.622 MB, 0.42% gc time)

The technique I use is the following: Run a single threaded kd tree builder until the same number of sub trees as the number of threads are created (currently only a power of 2 number of threads is supported). Use the data from the single threaded creator and run a completely normal kd builder in parallel on each sub tree.

It would be cool to see how this scales with even more cores.

Next step is to check out threading the knn and range searches as well as see if this could be implemented into master and have it cleanly fall back to single threaded when threading is not enabled.

KristofferC commented 8 years ago

On hold until Base threading is better.

abhijithch commented 7 years ago

Thanks for this nice package. I was looking for some parallel versions for knn and it led me here. Could you please throw some light on what is the plan and how could any of us help out.

KristofferC commented 7 years ago

I don't have a plan by myself but I can see ways by which parallelization could move forward. I am only talking about shared memory here.

Threaded building of the trees

The gist in the first post should give an idea how it could be done. What I did was that one thread worked on the tree and then more threads take over as the tree is split.

Threaded querying of points

Should be fairly straightforward to just partition the query points and have one thread work on each partition.

dgleich commented 6 years ago

Here's a quick code to do parallel querying in case anyone needs something practical right now!

println("KNN Searches")
function simple_partition(n::Int,k::Int)
  rval = ones(Int,k+1)
  m,r = divrem(n,k)
  fill!(rval, m)
  rval[1] = 1
  for i = 2:k+1
    rval[i] += r .> (i-2)
  end
  cumsum!(rval,rval)
end
function parallel_knn(kdtree, X, k)
    nt = Threads.nthreads()
    idxs_t = Vector{Any}(nt)
    dists_t = Vector{Any}(nt)
    parts = simple_partition(size(X,2),nt)
    Threads.@threads for i=1:nt
        idxs_t[i], dists_t[i] = knn(kdtree, X[:,parts[i]:parts[i+1]-1], k, true) 
    end
    return vcat(idxs_t...), vcat(dists_t...)
end

It passed my simple test on random points as giving the same things as knn on the full dataset. Otherwise, I haven't tested it at all.

Also, you'd probably want to make it pass more arguments to the actual knn function. I was just being lazy for my use.

Haven't tried to measure speedup! Just trying to use this to get some answers for a problem I don't want to wait forever for knn to compute.