ChenYingpeng / caffe-yolov3

A real-time object detection framework of Yolov3/v4 based on caffe
473 stars 231 forks source link

run detectnet failed,error "no kernel image is available for execution on the device" at forward_yolo_layer_gpu #9

Open anhuipl2010 opened 6 years ago

anhuipl2010 commented 6 years ago

I0809 17:26:14.176947 22577 upgrade_proto.cpp:80] Successfully upgraded batch norm layers using deprecated params. num_inputs is 1 num_outputs is 3 I0809 17:26:14.237879 22577 detectnet.cpp:78] Input data layer channels is 3 I0809 17:26:14.237900 22577 detectnet.cpp:79] Input data layer width is 416 I0809 17:26:14.237920 22577 detectnet.cpp:80] Input data layer height is 416 output blob1 shape c= 255, h = 13, w = 13 output blob2 shape c= 255, h = 26, w = 26 output blob3 shape c= 255, h = 52, w = 52 blobs.size()=3 0-step1 0-step2 0-step3 forward_yolo_layer_gpu 1 43095 a8240000 63800000 a826a15c 6382a15c forward_yolo_layer_gpu 2 CUDA Error: no kernel image is available for execution on the device CUDA Error: no kernel image is available for execution on the device: Resource temporarily unavailable

anhuipl2010 commented 6 years ago

which version of caffe is you used?Can you share the right version of caffe's link

ChenYingpeng commented 6 years ago

This may be your cuda problem,please check it.

anhuipl2010 commented 6 years ago

this error occurred at first line do 'copy_gpu(l.batchl.inputs,(float)input,1,l.output_gpu,1);' in function "forward_yolo_layer_gpu" in file yolo_layer.cpp .I add some print in this function than found it.

86 void forward_yolo_layer_gpu(const float input,layer l) 87 { 88 printf("before 11111\n"); 89 copy_gpu(l.batchl.inputs,(float)input,1,l.output_gpu,1); 90 printf("after 11111\n"); 91 int b,n; 92 for(b = 0;b < l.batch;++b){ 93 for(n =0;n< l.n;++n){ 94 int index = entry_index(l,b,nl.wl.h,0); 95 activate_array_gpu(l.output_gpu + index, 2l.wl.h,LOGISTIC); 96 index = entry_index(l,b,nl.wl.h,4); 97 activate_array_gpu(l.output_gpu + index,(1 + l.classes)l.wl.h,LOGISTIC); 98 } 99 } 100 cuda_pull_array(l.output_gpu,l.output,l.batchl.outputs); 101 }

after run detectnet ,found the error as follows:

I0824 18:42:00.235325 16998 detectnet.cpp:76] Input data layer channels is 3 I0824 18:42:00.235347 16998 detectnet.cpp:77] Input data layer width is 416 I0824 18:42:00.235353 16998 detectnet.cpp:78] Input data layer height is 416 output blob1 shape c= 255, h = 13, w = 13 output blob2 shape c= 255, h = 26, w = 26 output blob3 shape c= 255, h = 52, w = 52 before 11111 CUDA Error: no kernel image is available for execution on the device detectnet: /home/LiuQiang/ext_work/caffe-yolov3/cuda.cpp:30: void check_error(cudaError_t): Assertion `0' failed. 已放弃 (核心已转储)

anhuipl2010 commented 6 years ago

this code clone from your github caffe-yolov3 project,no changed or fixed

ChenYingpeng commented 5 years ago

May be something wrong with your pc cuda compute_sm? Please check your pc cuda compute_sm,for example this below in my CMakeList.txt. ` # setup CUDA find_package(CUDA)

set( CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; -O3 -gencode arch=compute_53,code=sm_53 #tegra tx1 -gencode arch=compute_61,code=sm_61 #gtx 1060 -gencode arch=compute_62,code=sm_62 #tegra tx2 )`

anhuipl2010 commented 5 years ago

Yes,i check it ,then run ok.Thanks.But The speed is slowly than org

ChenYingpeng commented 5 years ago

Yes ,caffe is slower than darkent.I suggest that you can speed it with tensorrt.

PiyalGeorge commented 5 years ago

Hi @anhuipl2010 , How much speed did you get for this? I mean FPS.

luthfianto commented 4 years ago

Hi @anhuipl2010, How much FPS did you get for this?