HPCE / hpce-2016-cw5

0 stars 2 forks source link

GPU Recursion #26

Open a-camuto opened 7 years ago

a-camuto commented 7 years ago

Is it possible to use GPU computing for recursive processes, even if the kernel isn't being called recursively ? I keep getting a segmentation error when trying to write from my output buffer to my output vector...

m8pple commented 7 years ago

Generally speaking you can't do recursion in the GPUs as they don't have a stack (although this is less true in newer GPUs, especially nvidia ones).

Hoever, it sounds like only the thing calling the kernel is recursive, which should be completely fine. It's probably a more normal bug.

Are you absolutely sure that your buffer is of the correct size, and that your kernel is not reading/writing off the end of the buffer?

a-camuto commented 7 years ago

The sizing seems correct, but let's say the kernel calls on a recursive function within its body, is that still feasible ?

m8pple commented 7 years ago

Ah - the ban on recursion within the kernel covers both direct and indirect recursion. So within the kernel you can call any number of functions, as long as none of them calls themselves, either directly or indirectly.

If you rememer the rough sketch of how the GPU does parallelism from the lectures, each work-item maintains its state using registers. As long as there is no recursion this works fine, as the compiler will effectively inline all the functions called by the kernel (it won't always inline, but the effect is the same).

Once you have any kind of recursion, then there needs to be some kind of stack. However, that stack is then per work-item, which means that you can end up with huge numbers of memory reads/writes for each function call in the kernel, and you need somewhere to store all those stacks. So the original approach in OpenCL was to ban recursion. However, a lot of OpenCL drivers appear to allow recursion, then just crash at run-time - I'm not sure whether that behaviour is in-spec or not, probably they should give an error when you try to compile the kernel.

Note that if you use CUDA, rather than OpenCL, the recursion is available to you, as the newer GPUs can support recursion. This is the general tradeoff of platform-specific versus general-purpose APIS, where you end up limited to the lower-common-demoninator feature-set.

Though the AWS part does support CUDA, so if you want to go in that direction I wouldn't mind - just requires refining the environment spec a bit.