[gpu][windows] invalid device function gpu/octree/src/cuda/octree_host.cu:64

caibf commented 4 years ago

Hi all,

Recently, I make a test with the example from PCL source at the following path.

/gpu/examples/segmentation/src/seg.cpp Here is the testing environment: - Operating System: Windows 10(1909) - IDE: Visual Studio 2017 Community - PCL: 1.9.1（enabled GPU, CUDA） - CMake: 3.11 - CUDA Toolkit: 10.0 I configured the source code and built with VS2017 community. Everything ran well. Next, the example is testing with debug and release mode. The debug mode ran well still, but the release mode failed with exception `Error: invalid device function F:/04_Software/3rdParty/PCL/pcl-pcl-1.9.1/gpu/octree/src/cuda/octree_host.cu:64`. You can take a look by the following screenshot. ![未命名1587346327](https://user-images.githubusercontent.com/4711529/79705904-d756c400-82e9-11ea-963e-62c9ddb023b5.png)

caibf commented 4 years ago

Could someone with specialized knowledge help me jump out of trouble? Urgently.

kunaltyagi commented 4 years ago

Both functions on L64 are valid. The issue needs more debugging on your end. For future notice, always assume the following answers:

Could someone with specialized knowledge help me jump out of trouble?

Maybe

Urgently

Mostly no. People with specialized knowledge/equipment are rare, volunteers are rarer

caibf commented 4 years ago

I have got into the source code. The follow codes(PCL\gpu\octree\src\cuda\octree_host.cu, Line 66) exists problem which the both varialbe of bin and ptx always return 0.

void  pcl::device::OctreeImpl::get_gpu_arch_compiled_for(int& bin, int& ptx)
{
    cudaFuncAttributes attrs;
    //cudaSafeCall( cudaFuncGetAttributes(&attrs, get_cc_kernel) );  
    cudaFuncGetAttributes(&attrs, get_cc_kernel);
    bin = attrs.binaryVersion;
    ptx = attrs.ptxVersion;

    std::ofstream logFile("pcl.log");
    logFile << "binary architecture version: " << bin << ", PTX virtual architecture version: " << ptx << std::endl;
    logFile.close();
}

kunaltyagi commented 4 years ago

Could you test a simple program (without PCL) to get the attributes and verify this?

caibf commented 4 years ago

Could you test a simple program (without PCL) to get the attributes and verify this?

Yes. Just right now, I write a simple example to check this, but its result just okay...

#include "device_launch_parameters.h"
#include "cuda_runtime.h"
#include <iostream>

__global__ void get_cc_kernel(int *data)
{
    data[threadIdx.x + blockDim.x * blockIdx.x] = threadIdx.x;
}

int main()
{
    int device;
    cudaGetDevice(&device);

    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, device);

    if (prop.major < 2)
    {
        std::cerr << "This code requires devices with compute capability >= 2.0" << std::endl;
        return -1;
    }

    cudaFuncAttributes attrs;
    //cudaSafeCall( cudaFuncGetAttributes(&attrs, get_cc_kernel) );  
    cudaFuncGetAttributes(&attrs, get_cc_kernel);
    int bin = attrs.binaryVersion;
    int ptx = attrs.ptxVersion;

    std::cout << "binary architecture version: " << bin << ", PTX virtual architecture version: " << ptx << std::endl;

    return 0;
}

Output

binary architecture version: 35, PTX virtual architecture version: 35

caibf commented 4 years ago

I think the CUDA Toolkit version maybe the biggest problem.

caibf commented 4 years ago

@kunaltyagi Do you know what's the best version of CUDA Toolkit for PCL 1.9.1?

kunaltyagi commented 4 years ago

Sorry, but no.

larshg commented 4 years ago

I just tested the example and I get 52 and 52, with my GTX980m. Also Win10 using vs2019. CUDA toolkit 10.2.

You wrote it worked well in debug - or?

caibf commented 4 years ago

I just tested the example and I get 52 and 52, with my GTX980m. Also Win10 using vs2019. CUDA toolkit 10.2.

Good job. Are you sure PCL 1.9.1 compatible with CUDA toolkit 10.2? A few days ago, I compiled PCL fail with 10.2 by using vs2017, so I degraded to CUDA toolkit 10.0. From then on, the configuration of CMake and building of PCL ran well. Subsequently, I used the sample code to test cuda feature of PCL. The issue occured above.

You wrote it worked well in debug - or?

It seems that this sample rans well in debug but take much time to execute.

larshg commented 4 years ago

I'll eventually try to look more into this, at some point. but i might be a while. I'm not sure if 1.9.1 is compatible with cuda 10.2, sorry.

caibf commented 4 years ago

@larshg Er. Could you tell me the detailed environment about PCL and more?

caibf commented 4 years ago

I simply comment the version check code in octree.cpp(PCL\gpu\octree\src\octree.cpp).

pcl::gpu::Octree::Octree() : cloud_(0), impl(0)
{
    Static<sizeof(PointType) == sizeof(OctreeImpl::PointType)>::check();

    int device;
    cudaSafeCall( cudaGetDevice( &device ) );

    cudaDeviceProp prop;
    cudaSafeCall( cudaGetDeviceProperties( &prop, device) );

    if (prop.major < 2)
        pcl::gpu::error("This code requires devices with compute capability >= 2.0", __FILE__, __LINE__);

    int bin, ptx;
    OctreeImpl::get_gpu_arch_compiled_for(bin, ptx);

    /*if (bin < 20 && ptx < 20)
        pcl::gpu::error("This must be compiled for compute capability >= 2.0", __FILE__, __LINE__);*/    

    impl = new OctreeImpl();        
    built_ = false;
}

Then ran in Release mode. 微信图片_20200422092831

larshg commented 4 years ago

Hey @caibf

I have maybe found the culprit for the runtime crash in Release mode. Can you try comment this line in the main CMakeLists file:

set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GL")

I have added a issue at https://github.com/thrust/thrust/issues/1127.

I'm not sure if it helps on the Invalid device function though.

caibf commented 4 years ago

Wonderful @larshg

With your help, I finally solved this issue. Just simply comment the line 152 in \CMakeLists file:

set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GL")

Then configured by using CMake and rebuilt solution by using MSVC. Everything is OK. There is no need to modify <PCL_Source>\gpu\octree\src\cuda\octree_host.cu at all.

The simple application ran well in both Debug and Release mode.

Thanks @larshg great help. I am new hand with GPU(CUDA). Could you talk about how to find this problem through your experience?

larshg commented 4 years ago

@caibf please keep the issue open until we have found a proper fix to PCL.

I'll write a short description later.

caibf commented 4 years ago

@larshg Okay. I think PCL should do more with GPU on the algorithms. Relevant content in the official website is too old, it needs to be updated.

PointCloudLibrary / pcl

[gpu][windows] invalid device function gpu/octree/src/cuda/octree_host.cu:64 #3951

Output