KristofferC / NearestNeighbors.jl

High performance nearest neighbor data structures (KDTree and BallTree) and algorithms for Julia.
Other
426 stars 66 forks source link

Approximate Nearest Neighbors algorithms #73

Closed Datseris closed 5 months ago

Datseris commented 6 years ago

Hello!

I am guessing @KristofferC already knows this, but there is a "new" approach to nearest neighbors called "approximate nearest neigbhors" for the kNN problem. For many applications these approaches are just as good as the "exact" methods. JuliaDynamics is very interested on this!

From the discussions that lead to #69 , I was told that the "state of the art"/"fastest" algorithm (faster than the one proposed in #69), is the one described here: https://arxiv.org/abs/1603.09320 that uses "Hierarchical Navigable Small World" graphs (hnsw). So that if I wanted to implement a faster kNN approach, I should do this one instead of the one in #69 .

In addition it is very interesting to share these benchmarks: https://github.com/erikbern/ann-benchmarks which are in python but they have a lot of different algortithms. This gives a nice perspective.

What is the consensus on having the hnsw method in this repo? Do we agree on that (because of the "approximate")? I think it fits, but a new repo in JuliaDynamics can also be made if putting it here is a problem.

By the way, there is already some code for this method: (C++, header only https://github.com/nmslib/hnsw )

@andyferris I am tagging you as well because you were interested in the discussion in #69 . @JonasIsensee was also mentioned in the proposal of this method.


p.s.: I intend to use some funding I got to get someone contribute a faster kNN method

KristofferC commented 6 years ago

Makes sense to have it here in my opinion.

KristofferC commented 6 years ago

Would be nice to run that benchmark here as well.

JonasIsensee commented 6 years ago

They should surely be connected and share as much of the same interface as possible. However I really like NearestNeighbors.jl not having a binary dependency... Maybe a separate extension package would be better here?

KristofferC commented 6 years ago

Hm, I didn't realize there was any thought of adding any binary dependencies to this repo.

JonasIsensee commented 6 years ago

Oh, maybe it was just me. I figured, we would first try a wrapper for the existing C++ implementation. But if someone wants to reimplement the whole thing...

KristofferC commented 6 years ago

A wrapper should definitely be in a separate package. A native version can live here.

zgornel commented 6 years ago

I can attest that nmslib is extremely fast - takes on the order of a few milliseconds to perform a search in millions of samples for a dimensionality >100. That's a couple of orders of magnitude faster than this implementation, which is fast only for a small number of dimensions.

zgornel commented 6 years ago

Thanks @Datseris for the links awesome :) Reffering to #69, the simplest (and maybe fastest) approach would be to implement a minimalistic ann directly in Julia ... I will look into this in the coming months if time allows.

dillondaudert commented 6 years ago

I am working on an implementation of the NN Descent algorithm from this paper: https://github.com/dillondaudert/NNDescent.jl . If it's something that people think is appropriate for NearestNeighbors.jl, I can look to merge it into this repo once I have a reasonably efficient implementation done.

Datseris commented 6 years ago

There seems to be many pieces of work happening in parallel; I think it is more fitting for a JuliaNeighbors to be created (hopefully with a better name, as this JuliaProperty org name is getting beyond boring).

JonasIsensee commented 6 years ago

@zgornel
I've already invested some time into my own implementation of the hnsw algorithm in Julia. (Working, but further optimization required ) I plan to put it online within the next few weeks.

@Datseris : This naming convention may be boring but I'd say that they're at least quite telling.

Datseris commented 6 years ago

booooooooooooooooooooooooooring. Why don't we name it TheHoodOfJulia?

zgornel commented 6 years ago

@JonasIsensee great, I can help with testing and further optimization if needed.

KristofferC commented 5 months ago

For now I think this package will be kept to exact searches to not bloat its scope too much.