flann-lib / flann

Fast Library for Approximate Nearest Neighbors
http://people.cs.ubc.ca/~mariusm/flann
Other
2.25k stars 647 forks source link

FLANN vs ANN #301

Open ferranroure opened 8 years ago

ferranroure commented 8 years ago

Hi,

I'm working with opencv::flann version, testing alongside with ANN. I notice that, to achieve the same accuracy than ANN, I need to use 45 kdtree's (set up in the Index creation) and more than 32 checks in the knn search. With this setup, Flann is many times slower than ANN. If I use the recommended setup, (from 4 to 16 kdtree's and 16 checks) it runs approximately as ANN, but with less accuracy. What is exactly the meaning of these parameters?

What I'm doing wrong? Maybe is the opencv version? Flann should be faster than ANN isn't?

Thanks you very much in advance!! :D

mariusmuja commented 8 years ago

The optimum parameters depends on many things, especially on the dimensionality of the data. If your data has relative low dimensionality (<10 dimensions or so), you would be better off using the single kd-tree index implementation (https://github.com/mariusmuja/flann/blob/master/src/cpp/flann/algorithms/kdtree_single_index.h) that performs exact or epilon-approximate search (similar to ANN).

For high dimensionlity data, the multiple randomized kd-trees tend to perform better.

cuongvt101 commented 8 years ago

Is there an equation to roughly determine the precision/accuracy based on dimensionality of data, number of tree, and number of checks (in Searchparams)? I am working with SIFT feature (128-d), trying 8 trees and my check is around 150, but all were just guessing. I know there's one option called Autotuned but (1) I can't make it work for my data C++, and (2) - more importantly, based on Matlab code, it tends to lean toward Hierachical K-means and in my experience, Hierarchical K-means is quite slow as compared to (randomized) KD tree.

ferranroure commented 8 years ago

I will try both suggestions. Thanks!

In my case, the dimension is 3: I'm trying different data structures for residue computation in a 3D registration (alignment) problem, which means that I only looking for a NN of each 3D point in the point cloud. As experts on this field, which kdtree implementation should I use? Apparently, at this moment, ANN is the fastest on my tests. Any suggestion?

Thank you very much. :D

mariusmuja commented 8 years ago

Unfortunately there's not equation to determine the precision/accuracy based on the search params other than doing some cross-validation on your data. The optimum search params can be very different for data with different statistics (for example random 128 dimensional data vs SIFT features).

In case of 3D data, the best algorithm to use is the KDTreeSingleIndex.

ferranroure commented 8 years ago

OK!, I gonna try this! Thank you very much! :D

On Mon, 27 Jun 2016 at 19:22 Marius Muja notifications@github.com wrote:

Unfortunately there's not equation to determine the precision/accuracy based on the search params other than doing some cross-validation on your data. The optimum search params can be very different for data with different statistics (for example random 128 dimensional data vs SIFT features).

In case of 3D data, the best algorithm to use is the KDTreeSingleIndex.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mariusmuja/flann/issues/301#issuecomment-228813357, or mute the thread https://github.com/notifications/unsubscribe/AOLp5OK0deaSJog4fymUVhPf6BMh8bL2ks5qQAbPgaJpZM4I8yi4 .

ferranroure commented 8 years ago

Hi again!

I already add the original Flann and I'm using the KDtreeSingleIndex. However, there is an error that I'm not able to solve. This error does not appear always. And this is so strange. Some times fails, and others no. In the non-failing executions, I can see an strange behavior, where the results obtained by flann are different each time.

The error: Assertion index > 0 && index < count' failed.

Pipeline: /usr/include/flann/algorithms/kdtree_single_index.h:492: void flann::KDTreeSingleIndex<Distance>::middleSplit(int*, int, int&, int&, flann::KDTreeSingleIndex<Distance>::DistanceType&, const BoundingBox&) [with Distance = flann::L2<float>; flann::KDTreeSingleIndex<Distance>::DistanceType = float; flann::KDTreeSingleIndex<Distance>::BoundingBox = std::vector<flann::KDTreeSingleIndex<flann::L2<float> >::Interval, std::allocator<flann::KDTreeSingleIndex<flann::L2<float> >::Interval> >]: Assertion index > 0 && index < count' failed.

Comming from the buildIndex() call:

myFlann::myFlann(vector<Point *> *P) {
    cfln = new converterFlann();
    dataPts = cfln->convertArray(P);
    kdtree = new flann::KDTreeSingleIndex<flann::L2<float> >(*dataPts, flann::KDTreeSingleIndexParams());
    kdtree->buildIndex();
}

I tried to find some information about, but I didn't found any relevant information.

Could you help me?

Thank you very much in advance! :D

mlh2000 commented 7 years ago

We have successfully used Flann 1.8.4's radiussearch on Ubuntu 14.04 and 16.04 for just under a year and are quite happy with it.

But in running the same code and data on OS X El Capitan (and Flann1.9.1, the version installed by brew) recently, we're finding that many fewer results are returned by radiussearch queries. We haven't yet tested 1.9.1 on Ubuntu.

Has there been a change in the specified behavior of radiussearch between versions 1.8.4 and 1.9.1? Or perhaps to some relevant parameter? Thanks for your help ...