That is a tutorial by itself.
You start by removing the CUDA-only qualifiers, like __device__, __device__ __host__, etc.
Then you need to move code out of the CUDA kernels and into for-loops. (A CUDA kernel is like a for-loop that runs in parallel, but you can also run it in sequence, in a for-loop).
Too long to explain in a simple post.
What I'd do if I didn't know CUDA and I wanted to run this CUDA-less is to learn enough CUDA to understand what the previous paragraphs means and then port it to (say), CPU-only code. It's not super difficult, but you do need to understand a bit of CUDA.
That is a tutorial by itself. You start by removing the CUDA-only qualifiers, like
__device__
,__device__ __host__
, etc. Then you need to move code out of the CUDA kernels and into for-loops. (A CUDA kernel is like a for-loop that runs in parallel, but you can also run it in sequence, in a for-loop).Too long to explain in a simple post. What I'd do if I didn't know CUDA and I wanted to run this CUDA-less is to learn enough CUDA to understand what the previous paragraphs means and then port it to (say), CPU-only code. It's not super difficult, but you do need to understand a bit of CUDA.