Open KristofferC opened 9 years ago
On hold until Base threading is better.
Thanks for this nice package. I was looking for some parallel versions for knn
and it led me here. Could you please throw some light on what is the plan and how could any of us help out.
I don't have a plan by myself but I can see ways by which parallelization could move forward. I am only talking about shared memory here.
The gist in the first post should give an idea how it could be done. What I did was that one thread worked on the tree and then more threads take over as the tree is split.
Should be fairly straightforward to just partition the query points and have one thread work on each partition.
Here's a quick code to do parallel querying in case anyone needs something practical right now!
println("KNN Searches")
function simple_partition(n::Int,k::Int)
rval = ones(Int,k+1)
m,r = divrem(n,k)
fill!(rval, m)
rval[1] = 1
for i = 2:k+1
rval[i] += r .> (i-2)
end
cumsum!(rval,rval)
end
function parallel_knn(kdtree, X, k)
nt = Threads.nthreads()
idxs_t = Vector{Any}(nt)
dists_t = Vector{Any}(nt)
parts = simple_partition(size(X,2),nt)
Threads.@threads for i=1:nt
idxs_t[i], dists_t[i] = knn(kdtree, X[:,parts[i]:parts[i+1]-1], k, true)
end
return vcat(idxs_t...), vcat(dists_t...)
end
It passed my simple test on random points as giving the same things as knn on the full dataset. Otherwise, I haven't tested it at all.
Also, you'd probably want to make it pass more arguments to the actual knn function. I was just being lazy for my use.
Haven't tried to measure speedup! Just trying to use this to get some answers for a problem I don't want to wait forever for knn to compute.
I started implementing some threading support using the threading branch in master julia. The linked gist below implements threaded kd tree creations. Code is non polished.
https://gist.github.com/KristofferC/15beed394361a325ff33
The speedup is about 2x using 4 cores.
The technique I use is the following: Run a single threaded kd tree builder until the same number of sub trees as the number of threads are created (currently only a power of 2 number of threads is supported). Use the data from the single threaded creator and run a completely normal kd builder in parallel on each sub tree.
It would be cool to see how this scales with even more cores.
Next step is to check out threading the knn and range searches as well as see if this could be implemented into master and have it cleanly fall back to single threaded when threading is not enabled.