ComputationalRadiationPhysics / stencil_filter_on_GPU

A GPU-accelerated stencil for filtering and smoothing using numba
GNU Lesser General Public License v3.0
5 stars 2 forks source link

Question about parallel sorting algorithm in numba.cuda #12

Open chrisHuxi opened 4 years ago

chrisHuxi commented 4 years ago

Hi,

We are trying to implement a parallel sorting algorithm to improve the median filter.

So in each thread, we need to sort an array, and if we want to sort them in parallel, it means in each thread we launch a sub-kernel to sort.

neither quicksort: https://stackoverflow.com/questions/14068263/cuda-quicksort-in-kernel-recall

nor odd-even sort: https://devtalk.nvidia.com/default/topic/394362/faulty-sort-algorithm-please-help-33-odd-even-sort/B

Both need to launch sub-kernel. This behavior is so-called "Dynamic Parallelism",

But unfortunately, Numba doesn't support this feature: https://numba.pydata.org/numba-doc/dev/cuda/kernels.html#kernel-declaration

So is there any other idea for parallel sorting algorithm?

PrometheusPi commented 4 years ago

I would go for a data/read inefficient, static parallelization. But this might not be faster in the end.: