flann-lib / flann

Fast Library for Approximate Nearest Neighbors
http://people.cs.ubc.ca/~mariusm/flann
Other
2.25k stars 648 forks source link

Low performance of FLANN - Improvements #329

Closed wbrandenburger closed 5 years ago

wbrandenburger commented 7 years ago

Hi,

FLANN is not efficient! But with a few changes you can make it efficient. In kdtree_index.h, kmeans_index.h, hierarchical_clustering_index.h each function getNeighbors(...) calls a heap which has as many elements as the pointcloud has elements. That means that FLANN builds for every query point a structure that has the same size as the pointcloud. Bu this structure should have only knnlog(size(Pointcloud)) elements, reffering to the complexity a search in kdtree O(knnlog(n)). In addition you have to replace the DynamicBitset with a selfbalancing Tree. When you apply both changes you are thousandfold faster when you compute 20 nearest neigbors of every point in a pointcloud which contains 10 000 000 points. The bigger the pointcloud the faster you are in comparison to prevoius implementation of FLANN.

Greets from Munich

mariusmuja commented 7 years ago

Yes, you are correct, this is an issue that needs fixing.

rugrat commented 7 years ago

That is indeed the main problem, but the same goes for domain-distances which is also allocated repeatedly. I've solved this while still maintaining thread-friendliness by initially allocating heaps for the desired number of threads and during query finding a heap not currently in use by other threads (same for domain-distances). Can provide code if someone comes across this and is interested.

Unrelatedly, kmeans-tree does not support addition of new elements after init. Have also patched this (albeit a bit more awkwardly), will be happy to help.

rasitsimsek commented 6 years ago

It is possible to use a sparse matrix for input dataset with a dimension of 1M x 1M. If you initialize such dimension with flann::Matrix it will be not so efficient.