Cuda kernel failed. Error: invalid device function

caijinlong commented 10 years ago

I have some errors like this when running the code. How to handle those problems?

F0221 16:54:21.855986 11564 im2col.cu:49] Cuda kernel failed. Error: invalid device function * Check failure stack trace: * @ 0x7f2556cc1b4d google::LogMessage::Fail() @ 0x7f2556cc5b67 google::LogMessage::SendToLog() @ 0x7f2556cc39e9 google::LogMessage::Flush() @ 0x7f2556cc3ced google::LogMessageFatal::~LogMessageFatal() @ 0x463bf2 caffe::im2col_gpu<>() @ 0x452031 caffe::ConvolutionLayer<>::Forward_gpu() @ 0x41288f caffe::Layer<>::Forward() @ 0x41c9be caffe::ConvolutionLayerTest_TestSimpleConvolution_Test<>::TestBody() @ 0x43becd testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x42dab1 testing::Test::Run() @ 0x42db97 testing::TestInfo::Run() @ 0x42dcd7 testing::TestCase::Run() @ 0x432bdf testing::internal::UnitTestImpl::RunAllTests() @ 0x43ba7d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x42d0da testing::UnitTest::Run() @ 0x40f774 main @ 0x318ae1ecdd (unknown) @ 0x40f4c9 (unknown) /bin/sh: line 1: 11564 Aborted (core dumped) $testbin 0

Yangqing commented 10 years ago

You might not have the GPU correctly set up, since the kernel call is saying invalid device function.

Yangqing

On Fri, Feb 21, 2014 at 1:30 AM, caijinlong notifications@github.comwrote:

I have some errors like this when running the code. How to handle those problems?

F0221 16:54:21.855986 11564 im2col.cu:49] Cuda kernel failed. Error: invalid device function * Check failure stack trace: * @ 0x7f2556cc1b4d google::LogMessage::Fail() @ 0x7f2556cc5b67 google::LogMessage::SendToLog() @ 0x7f2556cc39e9 google::LogMessage::Flush() @ 0x7f2556cc3ced google::LogMessageFatal::~LogMessageFatal() @ 0x463bf2 caffe::im2col_gpu<>() @ 0x452031 caffe::ConvolutionLayer<>::Forward_gpu() @ 0x41288f caffe::Layer<>::Forward() @ 0x41c9be caffe::ConvolutionLayerTest_TestSimpleConvolution_Test<>::TestBody() @ 0x43becd testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x42dab1 testing::Test::Run() @ 0x42db97 testing::TestInfo::Run() @ 0x42dcd7 testing::TestCase::Run() @ 0x432bdf testing::internal::UnitTestImpl::RunAllTests() @ 0x43ba7d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x42d0da testing::UnitTest::Run() @ 0x40f774 main @ 0x318ae1ecdd (unknown) @ 0x40f4c9 (unknown) /bin/sh: line 1: 11564 Aborted (core dumped) $testbin 0

Reply to this email directly or view it on GitHubhttps://github.com/BVLC/caffe/issues/138 .

caijinlong commented 10 years ago

Thanks Yangqing. The problem has been solved. It is GPU's setting.

Jinlong

nickjacob commented 10 years ago

@caijinlong would you mind posting what GPU settings were causing the problem?

Or @Yangqing are there any features (e.g., compute mode, persistence mode) that I should be aware of when configuring the GPU?

I'm having the same issue running on a K20; any code that runs a kernel gives an "Invalid Device Function" error.

Thanks! Nick

shelhamer commented 10 years ago

Can you run any CUDA demo, such as the NVIDIA-bundled samples? When in doubt, updating one's CUDA driver is worth a shot.

nickjacob commented 10 years ago

I can run the samples included in cuda 5.5, and my driver is at 319.37 which from reading other issues on here seems to be correct? Here's the output of deviceQuery (I'm running a K20 on AWS so I don't get access to fan speed for example)

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GRID K520"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 4096 MBytes (4294770688 bytes)
  ( 8) Multiprocessors, (192) CUDA Cores/MP:     1536 CUDA Cores
  GPU Clock rate:                                797 MHz (0.80 GHz)
  Memory Clock rate:                             2500 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 3
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GRID K520
Result = PASS

and this is the output of nvidia-smi -a:


==============NVSMI LOG==============

Timestamp                           : Mon Mar 10 08:41:44 2014
Driver Version                      : 319.37

Attached GPUs                       : 1
GPU 0000:00:03.0
    Product Name                    : GRID K520
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-f1f63fae-f245-3463-b8cd-2446df9fd1f3
    VBIOS Version                   : 80.04.D4.00.04
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    PCI
        Bus                         : 0x00
        Device                      : 0x03
        Domain                      : 0x0000
        Device Id                   : 0x118A10DE
        Bus Id                      : 0000:00:03.0
        Sub System Id               : 0x101410DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons         : N/A
    Memory Usage
        Total                       : 4095 MB
        Used                        : 9 MB
        Free                        : 4086 MB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        Gpu                         : 27 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 35.32 W
        Power Limit                 : 125.00 W
        Default Power Limit         : 125.00 W
        Enforced Power Limit        : 125.00 W
        Min Power Limit             : 85.00 W
        Max Power Limit             : 130.00 W
    Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 797 MHz
        SM                          : 797 MHz
        Memory                      : 2500 MHz
    Compute Processes               : None

Thanks so much for the help!

sguada commented 10 years ago

Can you try the device_query included in caffe/tools ? It seems that you are using Grid K520 in AWS, never tried that, so no sure if this would be helpul http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html

nickjacob commented 10 years ago

Thanks - applying information in netflix blog, although I think most of their issues were from direct calls to the nvidia performance primitive library, and caffe for me is getting stuck on custom cuda kernel calls. This is the output of the caffe device_query. Really appreciate the help!

Device id:                     0
Major revision number:         3
Minor revision number:         0
Name:                          GRID K520
Total global memory:           4294770688
Total shared memory per block: 49152
Total registers per block:     65536
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension of block:    1024, 1024, 64
Maximum dimension of grid:     2147483647, 65535, 65535
Clock rate:                    797000
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     8
Kernel execution timeout:      No

ailzhang commented 10 years ago

@caijinlong Hi, could you share some thoughts about GPU setting please? I had exactly the same error. But I can run cuda samples successfully. Have no idea how to solve this. Thank you!

eendebakpt commented 10 years ago

On my system (GeForce GTX 750 Ti) I could solve the error by modifying the Makefile.config by changing

CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_20,code=sm_21 \ -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 into

CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_20,code=sm_21 \ -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50

zimenglan-sysu commented 10 years ago

@eendebakpt, Hi, could you tell me how to compute the capacity of GPU? I don't know how to add '-gencode arch=compute_50,code=sm_50' ?

zimenglan-sysu commented 10 years ago

@caijinlong hi, i has some problem below:

Solver scaffolding done. I0611 18:38:49.181289 26648 solver.cpp:49] Solving XXXNet F0611 18:38:49.206163 26648 im2col.cu:54] Cuda kernel failed. Error: invalid device function * Check failure stack trace: * @ 0x7f7a643d8b7d google::LogMessage::Fail() @ 0x7f7a643dac7f google::LogMessage::SendToLog() @ 0x7f7a643d876c google::LogMessage::Flush() @ 0x7f7a643db51d google::LogMessageFatal::~LogMessageFatal() @ 0x45a59c caffe::im2col_gpu<>() @ 0x455857 caffe::ConvolutionLayer<>::Forward_gpu() @ 0x4325aa caffe::Net<>::ForwardPrefilled() @ 0x425568 caffe::Solver<>::Solve() @ 0x40e9b5 main @ 0x7f7a61d5076d (unknown) @ 0x41018d (unknown) Aborted (core dumped) Done.

how to handle this problem? thanks

ihsanafredi commented 10 years ago

Hi, I even changed it to gencode arch=compute_50,code=sm_50 but even than i received this below error, can any body help in this regards? .... debug: (top_id, top_data_id, blob_id, feat_id)=0,119,0,119 [ FAILED ] PowerLayerTest/1.TestPowerGradientGPU, where TypeParam = double (1737 ms) [----------] 20 tests from PowerLayerTest/1 (5441 ms total)

[----------] 5 tests from ConcatLayerTest/1, where TypeParam = double [ RUN ] ConcatLayerTest/1.TestSetupNum [ OK ] ConcatLayerTest/1.TestSetupNum (0 ms) [ RUN ] ConcatLayerTest/1.TestGPUGradient [ OK ] ConcatLayerTest/1.TestGPUGradient (102 ms) [ RUN ] ConcatLayerTest/1.TestCPUGradient [ OK ] ConcatLayerTest/1.TestCPUGradient (48 ms) [ RUN ] ConcatLayerTest/1.TestSetupChannels [ OK ] ConcatLayerTest/1.TestSetupChannels (0 ms) [ RUN ] ConcatLayerTest/1.TestCPUNum [ OK ] ConcatLayerTest/1.TestCPUNum (0 ms) [----------] 5 tests from ConcatLayerTest/1 (150 ms total)

[----------] 3 tests from PaddingLayerUpgradeTest [ RUN ] PaddingLayerUpgradeTest.TestSimple [ OK ] PaddingLayerUpgradeTest.TestSimple (1 ms) [ RUN ] PaddingLayerUpgradeTest.TestTwoTops [ OK ] PaddingLayerUpgradeTest.TestTwoTops (1 ms) [ RUN ] PaddingLayerUpgradeTest.TestImageNet [ OK ] PaddingLayerUpgradeTest.TestImageNet (1 ms) [----------] 3 tests from PaddingLayerUpgradeTest (3 ms total)

[----------] 1 test from GaussianFillerTest/0, where TypeParam = float [ RUN ] GaussianFillerTest/0.TestFill [ OK ] GaussianFillerTest/0.TestFill (0 ms) [----------] 1 test from GaussianFillerTest/0 (0 ms total)

[----------] 4 tests from TanHLayerTest/1, where TypeParam = double [ RUN ] TanHLayerTest/1.TestGradientCPU [ OK ] TanHLayerTest/1.TestGradientCPU (3 ms) [ RUN ] TanHLayerTest/1.TestForwardGPU F0723 15:33:51.379904 10297 tanh_layer.cu:30] Check failed: error == cudaSuccess (8 vs. 0) invalid device function * Check failure stack trace: * @ 0x2b626d617b7d google::LogMessage::Fail() @ 0x2b626d619c7f google::LogMessage::SendToLog() @ 0x2b626d61776c google::LogMessage::Flush() @ 0x2b626d61a51d google::LogMessageFatal::~LogMessageFatal() @ 0x64092e caffe::TanHLayer<>::Forward_gpu() @ 0x48cd82 caffe::TanHLayerTest_TestForwardGPU_Test<>::TestBody() @ 0x58d25d testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x585081 testing::Test::Run() @ 0x585166 testing::TestInfo::Run() @ 0x5852a7 testing::TestCase::Run() @ 0x5855fe testing::internal::UnitTestImpl::RunAllTests() @ 0x58cddd testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x5846de testing::UnitTest::Run() @ 0x4434dd main @ 0x2b626f98b76d (unknown) @ 0x4481ad (unknown) make: *\ [runtest] Aborted (core dumped)

lireagan commented 9 years ago

@eendebakpt Thank you, your anwser also helps me figure out another problem in Toolkit DeepNet

empty16 commented 8 years ago

@caijinlong could you share the solution? I have changed "Makefile.config" to gencode both arch=compute_50,code=sm_50 and arch=compute_50,code=sm_50 \ arch=compute_50,code=compute_50
but even than i received this below error, can any body help in this regards? [----------] 9 tests from ConvolutionLayerTest/1, where TypeParam = double [ RUN ] ConvolutionLayerTest/1.TestGPUGradient F1021 11:33:59.305110 3138 im2col.cu:54] Check failed: error == cudaSuccess (8 vs. 0) invalid device function * Check failure stack trace: * @ 0x2b8793543daa (unknown) @ 0x2b8793543ce4 (unknown) @ 0x2b87935436e6 (unknown) @ 0x2b8793546687 (unknown) @ 0x5f4e90 caffe::im2col_gpu<>() @ 0x5df0e3 caffe::ConvolutionLayer<>::Forward_gpu() @ 0x41b110 caffe::Layer<>::Forward() @ 0x4296ca caffe::GradientChecker<>::CheckGradientExhaustive() @ 0x476831 caffe::ConvolutionLayerTest_TestGPUGradient_Test<>::TestBody() @ 0x547a63 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x53e547 testing::Test::Run() @ 0x53e5ee testing::TestInfo::Run() @ 0x53e6f5 testing::TestCase::Run() @ 0x541a38 testing::internal::UnitTestImpl::RunAllTests() @ 0x541cc7 testing::UnitTest::Run() @ 0x412ac0 main @ 0x2b8797095ec5 (unknown) @ 0x417d57 (unknown) @ (nil) (unknown) make: *\ [runtest] Aborted (core dumped) could you help me figure this problem?

empty16 commented 8 years ago

@ihsanafredi Have you figured out this problem ?

ihsanafredi commented 8 years ago

My GPU was old.

dragontas commented 8 years ago

i did a simple: rm -r ./build mkdir build cmake .. make

The problem was a changed GPU, Sources needed to be rebuild.

hongzhenwang commented 8 years ago

I have solved the same problem. This problem occur when the version of cuda doesn't mach the caffe. Trick lies in the Makefile.config

For CUDA < 6.0, comment the *_50 lines for compatibility.

CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_20,code=sm_21 \ -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=sm_50 \ -gencode arch=compute_50,code=compute_50

if your cuda<6.0, then comment the last two lines.

Jumabek commented 8 years ago

@dragontas ' solution worked for me as well.

loretoparisi commented 7 years ago

I'm running this error with

$ docker run -ti caffe:gpu caffe --version
libdc1394 error: Failed
caffe version 1.0.0-rc3

and

$ nvidia-smi
Tue Oct 25 15:08:35 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
|  0%   48C    P8     7W / 200W |     62MiB /  8105MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
|  0%   38C    P8     7W / 200W |      1MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1241    G   /usr/lib/xorg/Xorg                              60MiB |
+-----------------------------------------------------------------------------+

and

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

kevinhuang06 commented 7 years ago

@dragontas 's solution worked for me as well!

vamsus commented 7 years ago

I am facing a similar error. using latest CUDA version (8.0) with enabled GPU Nvidia Geforce 820M . How to change the CUDA arch.

[ RUN ] TanHLayerTest/2.TestTanH F0310 07:19:41.605973 3025 tanh_layer.cu:26] Check failed: error == cudaSuccess (8 vs. 0) invalid device function Check failure stack trace: @ 0x7f5cb33b75cd google::LogMessage::Fail() @ 0x7f5cb33b9433 google::LogMessage::SendToLog() @ 0x7f5cb33b715b google::LogMessage::Flush() @ 0x7f5cb33b9e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f5cb162f2aa caffe::TanHLayer<>::Forward_gpu() @ 0x481379 caffe::Layer<>::Forward() @ 0x7b1320 caffe::TanHLayerTest<>::TestForward() @ 0x8e1cb3 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x8db2ca testing::Test::Run() @ 0x8db418 testing::TestInfo::Run() @ 0x8db4f5 testing::TestCase::Run() @ 0x8dc7cf testing::internal::UnitTestImpl::RunAllTests() @ 0x8dcaf3 testing::UnitTest::Run() @ 0x46693d main @ 0x7f5cb0d3b830 __libc_start_main @ 0x46dfd9 _start @ (nil) (unknown) Makefile:532: recipe for target 'runtest' failed make: *** [runtest] Aborted (core dumped)

balloch commented 7 years ago

Has anyone with CUDA 8.0 solved this problem?

lhk commented 7 years ago

I'm having problems with cuda 8, too

F0506 09:17:07.199545 19219 parallel.cpp:130] Check failed: error == cudaSuccess (10 vs. 0)  invalid device ordinal
*** Check failure stack trace: ***
    @     0x7f6db75a15cd  google::LogMessage::Fail()
    @     0x7f6db75a3433  google::LogMessage::SendToLog()
    @     0x7f6db75a115b  google::LogMessage::Flush()
    @     0x7f6db75a3e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f6db7e6d75d  caffe::DevicePair::compute()
    @     0x7f6db7e73480  caffe::P2PSync<>::Prepare()
    @     0x7f6db7e73f8e  caffe::P2PSync<>::Run()
    @           0x40ada0  train()
    @           0x407590  main
    @     0x7f6db6512830  __libc_start_main
    @           0x407db9  _start
    @              (nil)  (unknown)
Aborted (core dumped)

vamsus commented 7 years ago

@lhk @balloch I solved CUDA 8.0 installation. By disabling CUDNN support. As Nvidia 820M compute capability is 2.1. To support CUDNN compute capability should be more than 3.0.

(https://developer.nvidia.com/cuda-gpus) u can check your GPU compute capability. Disable it by commenting line in the makefile.

If u face same error then follow this installation guide link. (http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#axzz4ajfl49uf).

balloch commented 7 years ago

Thanks Sai!

On Sat, May 6, 2017 at 4:49 AM, Sai Varun notifications@github.com wrote:

@lhk https://github.com/lhk @balloch https://github.com/balloch I solved CUDA 8.0 installation. By disabling CUDNN support. As Nvidia 820M compute capability is 2.1. To support CUDNN compute capability should be more than 3.0.

(https://developer.nvidia.com/cuda-gpus) u can check your GPU compute capability. Disable it by commenting line in the makefile.

If u face same error then follow this installation guide link. ( http://docs.nvidia.com/cuda/cuda-installation-guide-linux/ index.html#axzz4ajfl49uf).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BVLC/caffe/issues/138#issuecomment-299625727, or mute the thread https://github.com/notifications/unsubscribe-auth/ACbyYtjDxdwCp831pBf6uYlrg1E4fpamks5r3DQOgaJpZM4BjyMO .

-- Jonathan Balloch B.S. Physics, Mathematics M.S.E. Robotics

sharoseali commented 6 years ago

I have Nvidia NVS 5200M and OS Windows 10 pro 64 bit and CUDA Version 8.0. When i run darknet\x64 \darknet_web_cam_voc i am having this error : CUDA Error: invalid device function CUDA Error: invalid device function: No error .. what is the issue plzz reply

eyildiz-ugoe commented 6 years ago

I have the following done already:

# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
CUDA_ARCH :=    -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_35,code=sm_35 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_52,code=sm_52 \
        -gencode arch=compute_60,code=sm_60 \
        -gencode arch=compute_61,code=sm_61 \
        -gencode arch=compute_61,code=compute_61

And I still have the problem, CUDA 9.0. Couldn't find any solution. Extremely frustrating.

dilipv09 commented 4 years ago

Thanks Yangqing. The problem has been solved. It is GPU's setting.

Jinlong

Hi Jinlong..gd day..could you please help what settings you changed in GPU and how?

HasanBank commented 4 years ago

I have a similar error. How did you solve it? Error: Check failed: error == cudaSuccess (98 vs. 0) invalid device function

yaofahua commented 2 years ago

I met similar problem, the error is : F0907 15:41:09.264920 202420 im2col.cu:61] Check failed: error == cudaSuccess (8 vs. 0) invalid device function

And solve by change --generate-code=arch=compute_20,code=sm_20 to --generate-code=arch=compute_20,code=[compute_20,sm_20] in .\cmake\Cuda.cmake.

  # Tell NVCC to add binaries for the specified GPUs
  foreach(__arch ${__cuda_arch_bin})
    if(__arch MATCHES "([0-9]+)\\(([0-9]+)\\)")
      # User explicitly specified PTX for the concrete BIN
      # list(APPEND __nvcc_flags -gencode arch=compute_${CMAKE_MATCH_2},code=sm_${CMAKE_MATCH_1})
      list(APPEND __nvcc_flags -gencode arch=compute_${CMAKE_MATCH_2},code=[compute_${CMAKE_MATCH_2},sm_${CMAKE_MATCH_1}])
      list(APPEND __nvcc_archs_readable sm_${CMAKE_MATCH_1})
    else()
      # User didn't explicitly specify PTX for the concrete BIN, we assume PTX=BIN
      # list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=sm_${__arch})
      list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=[compute_${__arch},sm_${__arch}])
      list(APPEND __nvcc_archs_readable sm_${__arch})
    endif()
  endforeach()

  # Tell NVCC to add PTX intermediate code for the specified architectures
  foreach(__arch ${__cuda_arch_ptx})
    list(APPEND __nvcc_flags -gencode arch=compute_${__arch},code=compute_${__arch})
    list(APPEND __nvcc_archs_readable compute_${__arch})
  endforeach()

For more information see : https://github.com/yaofahua/InvalidDeviceFunction

BVLC / caffe

Cuda kernel failed. Error: invalid device function #138

For CUDA < 6.0, comment the *_50 lines for compatibility.