how can I write it without cuda

That is a tutorial by itself. You start by removing the CUDA-only qualifiers, like __device__, __device__ __host__, etc. Then you need to move code out of the CUDA kernels and into for-loops. (A CUDA kernel is like a for-loop that runs in parallel, but you can also run it in sequence, in a for-loop).

Too long to explain in a simple post. What I'd do if I didn't know CUDA and I wanted to run this CUDA-less is to learn enough CUDA to understand what the previous paragraphs means and then port it to (say), CPU-only code. It's not super difficult, but you do need to understand a bit of CUDA.

etale-cohomology / evert-cuda

how can I write it without cuda #3