flann-lib / flann

Fast Library for Approximate Nearest Neighbors
http://people.cs.ubc.ca/~mariusm/flann
Other
2.26k stars 646 forks source link

Failing when nn > 250 #249

Open hdiethelm opened 9 years ago

hdiethelm commented 9 years ago

Hello,

While using FLANN, I discovered that the result is complete crap if the nearest neighbor count (nn) is bigger than 250. But for my application, I need about 1024 neighbors. Is this a known limitation or a bug?

To reproduce the bug, I use flann_example.c. When using the same data for generating the index and for search, the nearest neighbor should always be the same point, so dist[i*nn] should be zero. This works until nn > 250, not depending on the algorithm. Even FLANN_INDEX_LINEAR doesn't work in this case.

Regards, Hannes Diethelm

#include <flann/flann.h>

#include <stdio.h>
#include <stdlib.h>

float* read_points(const char* filename, int rows, int cols)
{
    float* data;
    float *p;
    FILE* fin;
    int i,j;

    fin = fopen(filename,"r");
    if (!fin) {
        printf("Cannot open input file.\n");
        exit(1);
    }

    data = (float*) malloc(rows*cols*sizeof(float));
    if (!data) {
        printf("Cannot allocate memory.\n");
        exit(1);
    }
    p = data;

    for (i=0;i<rows;++i) {
        for (j=0;j<cols;++j) {
            fscanf(fin,"%g ",p);
            p++;
        }
    }

    fclose(fin);

    return data;
}

void write_results(const char* filename, int *data, int rows, int cols)
{
    FILE* fout;
    int* p;
    int i,j;

    fout = fopen(filename,"w");
    if (!fout) {
        printf("Cannot open output file.\n");
        exit(1);
    }

    p = data;
    for (i=0;i<rows;++i) {
        for (j=0;j<cols;++j) {
            fprintf(fout,"%d ",*p);
            p++;
        }
        fprintf(fout,"\n");
    }
    fclose(fout);
}

int main(int argc, char** argv)
{
    float* dataset;
    float* testset;
    int nn, i;
    int* result;
    float* dists;
    struct FLANNParameters p;
    float speedup;
    flann_index_t index_id;

    int rows = 9000;
    int cols = 128;
    int tcount = 1000;

    /*
     * The files dataset.dat and testset.dat can be downloaded from:
     * http://people.cs.ubc.ca/~mariusm/uploads/FLANN/datasets/dataset.dat
     * http://people.cs.ubc.ca/~mariusm/uploads/FLANN/datasets/testset.dat
     */
    printf("Reading input data file.\n");
    dataset = read_points("dataset.dat", rows, cols);
    printf("Reading test data file.\n");
    testset = read_points("testset.dat", tcount, cols);

    nn = 251;
    result = (int*) malloc(tcount*nn*sizeof(int));
    dists = (float*) malloc(tcount*nn*sizeof(float));

    p = DEFAULT_FLANN_PARAMETERS;
    p.algorithm = FLANN_INDEX_KDTREE;
    p.trees = 8;
    p.log_level = FLANN_LOG_INFO;
    p.checks = 64;

    printf("Computing index.\n");
    index_id = flann_build_index(dataset, rows, cols, &speedup, &p);
    flann_find_nearest_neighbors_index(index_id, dataset, tcount, result, dists, nn, &p);

    for(i=0 ; i < tcount ; i++){
        /* Check if first nearest neighbour is actual point */
        if(dists[i*nn] != 0.0){
            printf("Failed!\n");
            return 1;
        }
    }

    write_results("results.dat",result, tcount, nn);

    flann_free_index(index_id, &p);
    free(dataset);
    free(testset);
    free(result);
    free(dists);

    return 0;
}
hdiethelm commented 9 years ago

I was just able to track down the problem: In nn_index.h, line 319: use_heap = (knn>KNN_HEAP_THRESHOLD)

If use_heap=true, a different algorithm is used for search, so the result is not ordered any more but still correct. Is this documented anywhere?