Closed Datseris closed 4 months ago
Makes sense to have it here in my opinion.
Would be nice to run that benchmark here as well.
They should surely be connected and share as much of the same interface as possible.
However I really like NearestNeighbors.jl
not having a binary dependency...
Maybe a separate extension package would be better here?
Hm, I didn't realize there was any thought of adding any binary dependencies to this repo.
Oh, maybe it was just me. I figured, we would first try a wrapper for the existing C++ implementation. But if someone wants to reimplement the whole thing...
A wrapper should definitely be in a separate package. A native version can live here.
I can attest that nmslib is extremely fast - takes on the order of a few milliseconds to perform a search in millions of samples for a dimensionality >100. That's a couple of orders of magnitude faster than this implementation, which is fast only for a small number of dimensions.
Thanks @Datseris for the links awesome :) Reffering to #69, the simplest (and maybe fastest) approach would be to implement a minimalistic ann directly in Julia ... I will look into this in the coming months if time allows.
I am working on an implementation of the NN Descent algorithm from this paper: https://github.com/dillondaudert/NNDescent.jl . If it's something that people think is appropriate for NearestNeighbors.jl, I can look to merge it into this repo once I have a reasonably efficient implementation done.
There seems to be many pieces of work happening in parallel; I think it is more fitting for a JuliaNeighbors
to be created (hopefully with a better name, as this JuliaProperty org name is getting beyond boring).
@zgornel
I've already invested some time into my own implementation of the hnsw algorithm in Julia.
(Working, but further optimization required )
I plan to put it online within the next few weeks.
@Datseris : This naming convention may be boring but I'd say that they're at least quite telling.
booooooooooooooooooooooooooring. Why don't we name it TheHoodOfJulia
?
@JonasIsensee great, I can help with testing and further optimization if needed.
For now I think this package will be kept to exact searches to not bloat its scope too much.
Hello!
I am guessing @KristofferC already knows this, but there is a "new" approach to nearest neighbors called "approximate nearest neigbhors" for the kNN problem. For many applications these approaches are just as good as the "exact" methods. JuliaDynamics is very interested on this!
From the discussions that lead to #69 , I was told that the "state of the art"/"fastest" algorithm (faster than the one proposed in #69), is the one described here: https://arxiv.org/abs/1603.09320 that uses "Hierarchical Navigable Small World" graphs (
hnsw
). So that if I wanted to implement a faster kNN approach, I should do this one instead of the one in #69 .In addition it is very interesting to share these benchmarks: https://github.com/erikbern/ann-benchmarks which are in python but they have a lot of different algortithms. This gives a nice perspective.
What is the consensus on having the
hnsw
method in this repo? Do we agree on that (because of the "approximate")? I think it fits, but a new repo in JuliaDynamics can also be made if putting it here is a problem.By the way, there is already some code for this method: (C++, header only https://github.com/nmslib/hnsw )
@andyferris I am tagging you as well because you were interested in the discussion in #69 . @JonasIsensee was also mentioned in the proposal of this method.
p.s.: I intend to use some funding I got to get someone contribute a faster kNN method