Open caibf opened 4 years ago
Could someone with specialized knowledge help me jump out of trouble? Urgently.
Both functions on L64 are valid. The issue needs more debugging on your end. For future notice, always assume the following answers:
Could someone with specialized knowledge help me jump out of trouble?
Maybe
Urgently
Mostly no. People with specialized knowledge/equipment are rare, volunteers are rarer
I have got into the source code. The follow codes(PCL\gpu\octree\src\cuda\octree_host.cu, Line 66) exists problem which the both varialbe of bin and ptx always return 0.
void pcl::device::OctreeImpl::get_gpu_arch_compiled_for(int& bin, int& ptx)
{
cudaFuncAttributes attrs;
//cudaSafeCall( cudaFuncGetAttributes(&attrs, get_cc_kernel) );
cudaFuncGetAttributes(&attrs, get_cc_kernel);
bin = attrs.binaryVersion;
ptx = attrs.ptxVersion;
std::ofstream logFile("pcl.log");
logFile << "binary architecture version: " << bin << ", PTX virtual architecture version: " << ptx << std::endl;
logFile.close();
}
Could you test a simple program (without PCL) to get the attributes and verify this?
Could you test a simple program (without PCL) to get the attributes and verify this?
Yes. Just right now, I write a simple example to check this, but its result just okay...
#include "device_launch_parameters.h"
#include "cuda_runtime.h"
#include <iostream>
__global__ void get_cc_kernel(int *data)
{
data[threadIdx.x + blockDim.x * blockIdx.x] = threadIdx.x;
}
int main()
{
int device;
cudaGetDevice(&device);
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, device);
if (prop.major < 2)
{
std::cerr << "This code requires devices with compute capability >= 2.0" << std::endl;
return -1;
}
cudaFuncAttributes attrs;
//cudaSafeCall( cudaFuncGetAttributes(&attrs, get_cc_kernel) );
cudaFuncGetAttributes(&attrs, get_cc_kernel);
int bin = attrs.binaryVersion;
int ptx = attrs.ptxVersion;
std::cout << "binary architecture version: " << bin << ", PTX virtual architecture version: " << ptx << std::endl;
return 0;
}
binary architecture version: 35, PTX virtual architecture version: 35
I think the CUDA Toolkit version maybe the biggest problem.
@kunaltyagi Do you know what's the best version of CUDA Toolkit for PCL 1.9.1?
Sorry, but no.
I just tested the example and I get 52 and 52, with my GTX980m. Also Win10 using vs2019. CUDA toolkit 10.2.
You wrote it worked well in debug - or?
I just tested the example and I get 52 and 52, with my GTX980m. Also Win10 using vs2019. CUDA toolkit 10.2.
Good job. Are you sure PCL 1.9.1 compatible with CUDA toolkit 10.2? A few days ago, I compiled PCL fail with 10.2 by using vs2017, so I degraded to CUDA toolkit 10.0. From then on, the configuration of CMake and building of PCL ran well. Subsequently, I used the sample code to test cuda feature of PCL. The issue occured above.
You wrote it worked well in debug - or?
It seems that this sample rans well in debug but take much time to execute.
I'll eventually try to look more into this, at some point. but i might be a while. I'm not sure if 1.9.1 is compatible with cuda 10.2, sorry.
@larshg Er. Could you tell me the detailed environment about PCL and more?
I simply comment the version check code in octree.cpp
(PCL\gpu\octree\src\octree.cpp).
pcl::gpu::Octree::Octree() : cloud_(0), impl(0)
{
Static<sizeof(PointType) == sizeof(OctreeImpl::PointType)>::check();
int device;
cudaSafeCall( cudaGetDevice( &device ) );
cudaDeviceProp prop;
cudaSafeCall( cudaGetDeviceProperties( &prop, device) );
if (prop.major < 2)
pcl::gpu::error("This code requires devices with compute capability >= 2.0", __FILE__, __LINE__);
int bin, ptx;
OctreeImpl::get_gpu_arch_compiled_for(bin, ptx);
/*if (bin < 20 && ptx < 20)
pcl::gpu::error("This must be compiled for compute capability >= 2.0", __FILE__, __LINE__);*/
impl = new OctreeImpl();
built_ = false;
}
Then ran in Release mode.
Hey @caibf
I have maybe found the culprit for the runtime crash in Release mode. Can you try comment this line in the main CMakeLists file:
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GL")
I have added a issue at https://github.com/thrust/thrust/issues/1127.
I'm not sure if it helps on the Invalid device function
though.
Wonderful @larshg
With your help, I finally solved this issue. Just simply comment the line 152 in
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GL")
Then configured by using CMake and rebuilt solution by using MSVC. Everything is OK. There is no need to modify <PCL_Source>\gpu\octree\src\cuda\octree_host.cu
at all.
The simple application ran well in both Debug and Release mode.
Thanks @larshg great help. I am new hand with GPU(CUDA). Could you talk about how to find this problem through your experience?
@caibf please keep the issue open until we have found a proper fix to PCL.
I'll write a short description later.
@larshg Okay. I think PCL should do more with GPU on the algorithms. Relevant content in the official website is too old, it needs to be updated.
Hi all,
Recently, I make a test with the example from PCL source at the following path.