CUDA 5.5 - indices are all zeros

fmorency commented 10 years ago

System: Ubuntu 12.10 CUDA: 5.5 (also tested with 4.9 and had the same issue) Graphic card: nVidia Quatro 2000 nVidia driver: 319.37 Compiler: GCC 4.7.2 Dataset: cloud.h5 from the FLANN test data

The following program compiles and runs without console error.

#define FLANN_USE_CUDA
#include <flann/flann.hpp>
#include <flann/io/hdf5.h>
#include <stdio.h>

int main(int argc, char** argv)
{
  int nn = 5;

  flann::Matrix<float> dataset;
  flann::Matrix<float> query;
  flann::load_from_file(dataset, "cloud.h5","dataset");
  flann::load_from_file(query, "cloud.h5","query");
  flann::Matrix<int> indices(new int[query.rows*nn], query.rows, nn);
  flann::Matrix<float> dists(new float[query.rows*nn], query.rows, nn);

  // construct an randomized kd-tree index using 4 kd-trees
  flann::Index<flann::L2<float> > index(dataset, flann::KDTreeCuda3dIndexParams());
  //flann::Index<flann::L2<float> > index(dataset, flann::KDTreeIndexParams());
  index.buildIndex();

  // do a knn search, using 128 checks
  index.knnSearch(query, indices, dists, nn, flann::SearchParams());
  flann::save_to_file(indices,"result.h5","result");

  delete [] dataset.ptr();
  delete [] query.ptr();
  delete [] indices.ptr();
  delete [] dists.ptr();

  return 0;
}

The problem is that result.h5 only contains 0. The program seems to give the right result if I use the KDTreeIndexParam CPU implementation. Does anyone had a similar issue and/or knows how to fix it?

Thanks, -F

clyecao commented 10 years ago

have the same problom of flann 1.8.4 . System: Microsoft Windows 7 64bit CUDA: 5.5 Graphic card: GTX 560 nVidia driver: 320.57 Compiler: VS2010

fmorency commented 10 years ago

Bump. Can anyone help on this?

taiya commented 10 years ago

I have the same problem too from what I can see. And I am running Cuda 6.5 on a macbook retina.

POTENTIAL SOLUTION use the following to build the index in the code above: flann::KDTreeCuda3dIndex< flann::L2 > index(dataset);

#define FLANN_USE_CUDA
#include <flann/flann.hpp>
#include <flann/io/hdf5.h>
#include <stdio.h>

int main(int argc, char** argv)
{
  int nn = 1;

  flann::Matrix<float> dataset;
  flann::Matrix<float> query;
  flann::load_from_file(dataset, "cloud.h5","dataset");
  flann::load_from_file(query, "cloud.h5","query");
  flann::Matrix<int> indices(new int[query.rows*nn], query.rows, nn);
  flann::Matrix<float> dists(new float[query.rows*nn], query.rows, nn);

  // construct an randomized kd-tree index using 4 kd-trees
  flann::KDTreeCuda3dIndex< flann::L2<float> > index(dataset);
  // flann::Index<flann::L2<float> > index(dataset, flann::KDTreeCuda3dIndexParams());
  // flann::Index<flann::L2<float> > index(dataset, flann::KDTreeIndexParams());
  index.buildIndex();

  // do a knn search, using 128 checks
  index.knnSearch(query, indices, dists, nn, flann::SearchParams());
  flann::save_to_file(indices,"result.h5","result");

  for(int i=0; i<query.rows; i++)
    std::cout << indices[i][0] << std::endl;

  delete [] dataset.ptr();
  delete [] query.ptr();
  delete [] indices.ptr();
  delete [] dists.ptr();

  return 0;
}

djsutherland commented 10 years ago

@drtaglia, you should put triple-backticks before and after your code, so it all renders as a code block. As-is the includes got treated as invalid HTML and don't render.

taiya commented 10 years ago

I am not sure what you are talking about. In markdown you put 4 leading whitespaces to show code. That's what I do above and it renders perfectly in chrome for me. (Nevermind! I see that with ```cpp you get highlighting as well, thanks for mentioning this).

taiya commented 10 years ago

Also, I confirm the issue above. I (painfully) added gtest to flann, then tried the built-in cuda tests (dummy me for not seeing them). Most of them failing, but as the one I care about is the Flann_3D_Random_Cloud.Test4NNGpuBuffers, I can happily report that applying the same change that I did above allows the test to run!!

    flann::KDTreeCuda3dIndexParams index_params;
    index_params["input_is_gpu_float4"]=true;
    // flann::Index<L2_Simple<float> > index(data_device_matrix, index_params);
    flann::KDTreeCuda3dIndex< flann::L2<float> > index(data_device_matrix, index_params);
    start_timer("Building kd-tree index...");
    index.buildIndex();
    printf("done (%g seconds)\n", stop_timer());

Snippet of test run

[ RUN      ] Flann_3D_Random_Cloud.Test4NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (3e-06 seconds)
Searching KNN...done (2.30427 seconds)
Building kd-tree index...done (0.017038 seconds)
Searching KNN...done (4e-05 seconds)
[       OK ] Flann_3D_Random_Cloud.Test4NNGpuBuffers (2330 ms)

Otherwise... this happes:

[ RUN      ] Flann_3D_Random_Cloud.Test4NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (3e-06 seconds)
Searching KNN...done (2.3784 seconds)
Building kd-tree index...done (0.016676 seconds)
Searching KNN...The program has unexpectedly finished.
/Users/andrea/Developer/htrack/apps/flann_gpu-build/flann_cuda_test crashed

However, I think I was too-happy too-early. This doesn't look good...

[ RUN      ] Flann_3D_Random_Cloud.Test1NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (6e-06 seconds)
Searching KNN...done (2.37411 seconds)
Building kd-tree index...done (0.023425 seconds)
Searching KNN...done (3.6e-05 seconds)
/Users/andrea/Developer/htrack/apps/flann_gpu/flann_cuda_test.cu:264: Failure
Expected: (precision) >= (0.99), actual: 0.25 vs 0.99
Precision: 0.25
[  FAILED  ] Flann_3D_Random_Cloud.Test1NNGpuBuffers (2483 ms)
[----------] 1 test from Flann_3D_Random_Cloud (2483 ms total)

taiya commented 10 years ago

Allright, I think I narrowed it down. @mariusmuja you might want to take a peek at it. I opened the FLANN source looking for "input_is_gpu_float4"

template<typename Distance>
void KDTreeCuda3dIndex<Distance>::uploadTreeToGpu()
{
    // just make sure that no weird alignment stuff is going on...
    // shouldn't, but who knows
    // (I would make this a (boost) static assertion, but so far flann seems to avoid boost
    //  assert( sizeof( KdTreeCudaPrivate::GpuNode)==sizeof( Node ) );
    delete gpu_helper_;
    gpu_helper_ = new GpuHelper;
    gpu_helper_->gpu_points_=new thrust::device_vector<float4>(size_);
    thrust::device_vector<float4> tmp(size_);
    if( get_param(index_params_,"input_is_gpu_float4",false) ) {
        assert( dataset_.cols == 3 && dataset_.stride==4*sizeof(float));
        thrust::copy( thrust::device_pointer_cast((float4*)dataset_.ptr()),thrust::device_pointer_cast((float4*)(dataset_.ptr()))+size_,tmp.begin());

    }
...

In particular, notice that big fat assert:

dataset_.stride==4*sizeof(float)

Now, what puzzled me, is that in the _flann_cudatest.cu as well as in the documentation, you mention to use a stride of 4... But why would the code above multiply it by a sizeof(float) then!?!? So I attempted the same on the "client" side:

TEST_F(Flann_3D_Random_Cloud, Test1NNGpuBuffers)
{
    thrust::host_vector<float4> data_host(data.rows);
    for( int i=0; i<data.rows; i++ )
    {
        data_host[i]=make_float4(data[i][0],data[i][1],data[i][2],0);
    }
    thrust::device_vector<float4> data_device = data_host;
    thrust::host_vector<float4> query_host(data.rows);
    for( int i=0; i<data.rows; i++ )
    {
        query_host[i]=make_float4(query[i][0],query[i][1],query[i][2],0);
    }
    thrust::device_vector<float4> query_device = query_host;

    flann::Matrix<float> data_device_matrix( (float*)thrust::raw_pointer_cast(&data_device[0]),data.rows,3,4*sizeof(float));
    flann::Matrix<float> query_device_matrix( (float*)thrust::raw_pointer_cast(&query_device[0]),data.rows,3,4*sizeof(float));

    flann::KDTreeCuda3dIndexParams index_params;
    index_params["input_is_gpu_float4"]=true;
#ifdef UNEXPECTEDLY_FINISHED_CRASH
    flann::Index<L2_Simple<float> > index(data_device_matrix, index_params);
#else
    flann::KDTreeCuda3dIndex<L2_Simple<float> /*flann::L2<float>*/ > index(data_device_matrix, index_params);
#endif
    start_timer("Building kd-tree index...");
    index.buildIndex();
    printf("done (%g seconds)\n", stop_timer());

    thrust::device_vector<int> indices_device(query.rows);
    thrust::device_vector<float> dists_device(query.rows);
    flann::Matrix<int> indices_device_matrix( (int*)thrust::raw_pointer_cast(&indices_device[0]),query.rows,1);
    flann::Matrix<float> dists_device_matrix( (float*)thrust::raw_pointer_cast(&dists_device[0]),query.rows,1);

    start_timer("Searching KNN...");
    indices.cols=1;
    dists.cols=1;
    flann::SearchParams sp;
    sp.matrices_in_gpu_ram=true;
    index.knnSearch(query_device_matrix, indices_device_matrix, dists_device_matrix, 1, sp );
    printf("done (%g seconds)\n", stop_timer());

    flann::Matrix<int> indices_host( new int[ query.rows], query.rows, 1 );
    flann::Matrix<float> dists_host( new float[ query.rows], query.rows, 1 );
    thrust::copy( dists_device.begin(), dists_device.end(), dists_host.ptr() );
    thrust::copy( indices_device.begin(), indices_device.end(), indices_host.ptr() );
    // thrust::copy(indices_device.begin(), indices_device.end(), std::ostream_iterator<int>(std::cout, " "));

    // float precision = compute_precision(gt_indices,indices);
    float precision = computePrecisionDiscrete(gt_dists,dists_host, 1e-08);
    EXPECT_GE(precision, 0.99);
    printf("Precision: %g\n", precision);
    delete [] indices_host.ptr();
    delete [] dists_host.ptr();
}

and suddenly... voila'... the precision test goes back to normal :)

[ RUN      ] Flann_3D_Random_Cloud.Test1NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (1e-06 seconds)
Searching KNN...done (2.52645 seconds)
Building kd-tree index...done (0.017331 seconds)
Searching KNN...done (2.5e-05 seconds)
Precision: 1
[       OK ] Flann_3D_Random_Cloud.Test1NNGpuBuffers (2555 ms)

taiya commented 10 years ago

I just did more checking. I confirm the stride is in bytes, and not templated by matrix type.

thrust::host_vector<float4> queries;
/// fill queries here....
flann::Matrix<float> queries_M((float*) thrust::raw_pointer_cast(&queries[0]), queries.size(), 3 /*3D*/, 4*sizeof(float) /*stride*/);
for (int i = 0; i < 10; ++i)
    printf("queries: %f %f %f %f\n", queries_M[i][0], queries_M[i][1], queries_M[i][2], queries_M[i][3]);

If I don't put sizeof(float) the printout doesn't match the numbers I inserted in the queries vector (this is not really a CUDA bug, but a documentation+test issue)

flann-lib / flann

CUDA 5.5 - indices are all zeros #183