Open fmorency opened 10 years ago
have the same problom of flann 1.8.4 . System: Microsoft Windows 7 64bit CUDA: 5.5 Graphic card: GTX 560 nVidia driver: 320.57 Compiler: VS2010
Bump. Can anyone help on this?
I have the same problem too from what I can see. And I am running Cuda 6.5 on a macbook retina.
POTENTIAL SOLUTION use the following to build the index in the code above:
flann::KDTreeCuda3dIndex< flann::L2
#define FLANN_USE_CUDA
#include <flann/flann.hpp>
#include <flann/io/hdf5.h>
#include <stdio.h>
int main(int argc, char** argv)
{
int nn = 1;
flann::Matrix<float> dataset;
flann::Matrix<float> query;
flann::load_from_file(dataset, "cloud.h5","dataset");
flann::load_from_file(query, "cloud.h5","query");
flann::Matrix<int> indices(new int[query.rows*nn], query.rows, nn);
flann::Matrix<float> dists(new float[query.rows*nn], query.rows, nn);
// construct an randomized kd-tree index using 4 kd-trees
flann::KDTreeCuda3dIndex< flann::L2<float> > index(dataset);
// flann::Index<flann::L2<float> > index(dataset, flann::KDTreeCuda3dIndexParams());
// flann::Index<flann::L2<float> > index(dataset, flann::KDTreeIndexParams());
index.buildIndex();
// do a knn search, using 128 checks
index.knnSearch(query, indices, dists, nn, flann::SearchParams());
flann::save_to_file(indices,"result.h5","result");
for(int i=0; i<query.rows; i++)
std::cout << indices[i][0] << std::endl;
delete [] dataset.ptr();
delete [] query.ptr();
delete [] indices.ptr();
delete [] dists.ptr();
return 0;
}
@drtaglia, you should put triple-backticks before and after your code, so it all renders as a code block. As-is the includes got treated as invalid HTML and don't render.
I am not sure what you are talking about. In markdown you put 4 leading whitespaces to show code. That's what I do above and it renders perfectly in chrome for me. (Nevermind! I see that with ```cpp you get highlighting as well, thanks for mentioning this).
Also, I confirm the issue above. I (painfully) added gtest to flann, then tried the built-in cuda tests (dummy me for not seeing them). Most of them failing, but as the one I care about is the Flann_3D_Random_Cloud.Test4NNGpuBuffers, I can happily report that applying the same change that I did above allows the test to run!!
flann::KDTreeCuda3dIndexParams index_params;
index_params["input_is_gpu_float4"]=true;
// flann::Index<L2_Simple<float> > index(data_device_matrix, index_params);
flann::KDTreeCuda3dIndex< flann::L2<float> > index(data_device_matrix, index_params);
start_timer("Building kd-tree index...");
index.buildIndex();
printf("done (%g seconds)\n", stop_timer());
Snippet of test run
[ RUN ] Flann_3D_Random_Cloud.Test4NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (3e-06 seconds)
Searching KNN...done (2.30427 seconds)
Building kd-tree index...done (0.017038 seconds)
Searching KNN...done (4e-05 seconds)
[ OK ] Flann_3D_Random_Cloud.Test4NNGpuBuffers (2330 ms)
Otherwise... this happes:
[ RUN ] Flann_3D_Random_Cloud.Test4NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (3e-06 seconds)
Searching KNN...done (2.3784 seconds)
Building kd-tree index...done (0.016676 seconds)
Searching KNN...The program has unexpectedly finished.
/Users/andrea/Developer/htrack/apps/flann_gpu-build/flann_cuda_test crashed
However, I think I was too-happy too-early. This doesn't look good...
[ RUN ] Flann_3D_Random_Cloud.Test1NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (6e-06 seconds)
Searching KNN...done (2.37411 seconds)
Building kd-tree index...done (0.023425 seconds)
Searching KNN...done (3.6e-05 seconds)
/Users/andrea/Developer/htrack/apps/flann_gpu/flann_cuda_test.cu:264: Failure
Expected: (precision) >= (0.99), actual: 0.25 vs 0.99
Precision: 0.25
[ FAILED ] Flann_3D_Random_Cloud.Test1NNGpuBuffers (2483 ms)
[----------] 1 test from Flann_3D_Random_Cloud (2483 ms total)
Allright, I think I narrowed it down. @mariusmuja you might want to take a peek at it. I opened the FLANN source looking for "input_is_gpu_float4"
template<typename Distance>
void KDTreeCuda3dIndex<Distance>::uploadTreeToGpu()
{
// just make sure that no weird alignment stuff is going on...
// shouldn't, but who knows
// (I would make this a (boost) static assertion, but so far flann seems to avoid boost
// assert( sizeof( KdTreeCudaPrivate::GpuNode)==sizeof( Node ) );
delete gpu_helper_;
gpu_helper_ = new GpuHelper;
gpu_helper_->gpu_points_=new thrust::device_vector<float4>(size_);
thrust::device_vector<float4> tmp(size_);
if( get_param(index_params_,"input_is_gpu_float4",false) ) {
assert( dataset_.cols == 3 && dataset_.stride==4*sizeof(float));
thrust::copy( thrust::device_pointer_cast((float4*)dataset_.ptr()),thrust::device_pointer_cast((float4*)(dataset_.ptr()))+size_,tmp.begin());
}
...
In particular, notice that big fat assert:
dataset_.stride==4*sizeof(float)
Now, what puzzled me, is that in the _flann_cudatest.cu as well as in the documentation, you mention to use a stride of 4... But why would the code above multiply it by a sizeof(float) then!?!? So I attempted the same on the "client" side:
TEST_F(Flann_3D_Random_Cloud, Test1NNGpuBuffers)
{
thrust::host_vector<float4> data_host(data.rows);
for( int i=0; i<data.rows; i++ )
{
data_host[i]=make_float4(data[i][0],data[i][1],data[i][2],0);
}
thrust::device_vector<float4> data_device = data_host;
thrust::host_vector<float4> query_host(data.rows);
for( int i=0; i<data.rows; i++ )
{
query_host[i]=make_float4(query[i][0],query[i][1],query[i][2],0);
}
thrust::device_vector<float4> query_device = query_host;
flann::Matrix<float> data_device_matrix( (float*)thrust::raw_pointer_cast(&data_device[0]),data.rows,3,4*sizeof(float));
flann::Matrix<float> query_device_matrix( (float*)thrust::raw_pointer_cast(&query_device[0]),data.rows,3,4*sizeof(float));
flann::KDTreeCuda3dIndexParams index_params;
index_params["input_is_gpu_float4"]=true;
#ifdef UNEXPECTEDLY_FINISHED_CRASH
flann::Index<L2_Simple<float> > index(data_device_matrix, index_params);
#else
flann::KDTreeCuda3dIndex<L2_Simple<float> /*flann::L2<float>*/ > index(data_device_matrix, index_params);
#endif
start_timer("Building kd-tree index...");
index.buildIndex();
printf("done (%g seconds)\n", stop_timer());
thrust::device_vector<int> indices_device(query.rows);
thrust::device_vector<float> dists_device(query.rows);
flann::Matrix<int> indices_device_matrix( (int*)thrust::raw_pointer_cast(&indices_device[0]),query.rows,1);
flann::Matrix<float> dists_device_matrix( (float*)thrust::raw_pointer_cast(&dists_device[0]),query.rows,1);
start_timer("Searching KNN...");
indices.cols=1;
dists.cols=1;
flann::SearchParams sp;
sp.matrices_in_gpu_ram=true;
index.knnSearch(query_device_matrix, indices_device_matrix, dists_device_matrix, 1, sp );
printf("done (%g seconds)\n", stop_timer());
flann::Matrix<int> indices_host( new int[ query.rows], query.rows, 1 );
flann::Matrix<float> dists_host( new float[ query.rows], query.rows, 1 );
thrust::copy( dists_device.begin(), dists_device.end(), dists_host.ptr() );
thrust::copy( indices_device.begin(), indices_device.end(), indices_host.ptr() );
// thrust::copy(indices_device.begin(), indices_device.end(), std::ostream_iterator<int>(std::cout, " "));
// float precision = compute_precision(gt_indices,indices);
float precision = computePrecisionDiscrete(gt_dists,dists_host, 1e-08);
EXPECT_GE(precision, 0.99);
printf("Precision: %g\n", precision);
delete [] indices_host.ptr();
delete [] dists_host.ptr();
}
and suddenly... voila'... the precision test goes back to normal :)
[ RUN ] Flann_3D_Random_Cloud.Test1NNGpuBuffers
creating random point cloud (10000 points)...done
Building linear index...done (1e-06 seconds)
Searching KNN...done (2.52645 seconds)
Building kd-tree index...done (0.017331 seconds)
Searching KNN...done (2.5e-05 seconds)
Precision: 1
[ OK ] Flann_3D_Random_Cloud.Test1NNGpuBuffers (2555 ms)
I just did more checking. I confirm the stride is in bytes, and not templated by matrix type.
thrust::host_vector<float4> queries;
/// fill queries here....
flann::Matrix<float> queries_M((float*) thrust::raw_pointer_cast(&queries[0]), queries.size(), 3 /*3D*/, 4*sizeof(float) /*stride*/);
for (int i = 0; i < 10; ++i)
printf("queries: %f %f %f %f\n", queries_M[i][0], queries_M[i][1], queries_M[i][2], queries_M[i][3]);
If I don't put sizeof(float) the printout doesn't match the numbers I inserted in the queries vector (this is not really a CUDA bug, but a documentation+test issue)
System: Ubuntu 12.10 CUDA: 5.5 (also tested with 4.9 and had the same issue) Graphic card: nVidia Quatro 2000 nVidia driver: 319.37 Compiler: GCC 4.7.2 Dataset: cloud.h5 from the FLANN test data
The following program compiles and runs without console error.
The problem is that result.h5 only contains 0. The program seems to give the right result if I use the KDTreeIndexParam CPU implementation. Does anyone had a similar issue and/or knows how to fix it?
Thanks, -F