alvarocollet / gpu_turbopixels

Superpixel computation (using Levinhstein's Turbopixels) on the GPU, achieving 2-3fps on 640x480 images.
5 stars 6 forks source link

Crash in do_reduction_count #2

Open UDoenni opened 10 years ago

UDoenni commented 10 years ago

do_reduction_count can crash for certain input sizes. For example when using size 64. I'm assuming the code is from http://docs.nvidia.com/cuda/cuda-samples/#cuda-parallel-reduction there it says "Note, this kernel needs a minimum of 64_sizeof(T) bytes of shared memory. In other words if blockSize <= 32, allocate 64_sizeof(T) bytes. If blockSize > 32, allocate blockSizesizeof(T) bytes." So for shared memory size, I used int smemSize = max(64, threads) \ sizeof(T); everywhere, which works.

There's also a small bug in the reduction: if (blockSize >= 2) { smem[tid] += smem[tid + 2]; EMUSYNC; } should be if (blockSize >= 2) { smem[tid] += smem[tid + 1]; EMUSYNC; }

In general, the reduction code is hard to read and maintain. I suggest using thrust (template library which is part of CUDA), which simplifies things a lot:

include "thrust/transform_reduce.h"

include "thrust/device_ptr.h"

include "thrust/functional.h"

int do_reduction_count_thrust(int* in, int size) { thrust::device_ptr _inBegin = thrust::device_pointer_cast(in); thrust::device_ptr _inEnd = thrust::device_pointer_cast(in + size);

int gpu_result = thrust::transform_reduce(_inBegin, _inEnd, thrust::placeholders::_1 != 0, 0, thrust::plus());

return gpu_result; }

ScutZB commented 9 years ago

Hey, I met a similar problem of input size. My runtime environmnet is Ubuntu 14.10 64-bit & CUDA 6.5 & OpenCV 2.4.10. If I use the image which width isn't 16*n, then the program will crashed after tpix.process(). And I solved this problem by changing memory pre-allocated size. It's in line 820 of turbopix.cu. I changed it from img_pitch = (img_width + 31) & (~31); to img_pitch = img_width; And everything works fine. I'm not sure if this is cased by do_reduction_count. Will this change cause some other problems? I hope someone can give me an answer. Thanks