fjarri / reikna

Pure Python GPGPU library
http://reikna.publicfields.net/
MIT License
164 stars 16 forks source link

Planning arbitrary Python functions #41

Open robertmaxton42 opened 6 years ago

robertmaxton42 commented 6 years ago

It'd be nice if we could add a scheduled call to a function, to be executed only at runtime, to a plan. The particular use case I have in mind is using NVTX events and ranges, but now that it occurs to me it seems like something that might be generally useful.

(By the way, while kernel_call says that it defaults to fast_math=False, for some reason whenever I have a C++ error the nvcc command has the -use-fast-math flag set...)

-- Actually, related to that last, it'd be cool if we could require at planning time that our kernel be compiled with particular options. In particular, -std=c++11. I strongly suspect it would cut down on usage errors in that case in particular.

fjarri commented 6 years ago

Hm, I agree that it would be useful. The question is the specific API though. I suppose that function will need a Thread object added to its argument list in runtime. Also, perhaps some kind of (optional) conversion of numpy arrays to GPU arrays and back? Those functions that you want to call, what do they take as arguments, and what do they return? Just to have some concrete example.

(By the way, while kernel_call says that it defaults to fast_math=False, for some reason whenever I have a C++ error the nvcc command has the -use-fast-math flag set...)

That's weird, I tried a simple example, and it doesn't happen for me. Can you make a MRE?

robertmaxton42 commented 6 years ago

NVTX, while the idea I had in mind, may be a bad general example - since they're really just debugger flags, they take very simple inputs (a c-typed string) and return nothing. If I were doing a Computation "by hand", instead of using a plan, I'd just do

nvtx = ctypes.CDLL("/usr/local/cuda-9.0/lib64/libnvToolsExt.so.1.0.0")

at the top and then call

nvtx.nvtxRangePushA(ctypes.c_char_p(b"scan"))
<some Computation>(arr, ...)
nvtx.nvtxRangePop()

and then the duration of the <some Computation> shows up in the Visual Profiler as a side effect.

That's weird, I tried a simple example, and it doesn't happen for me. Can you make a MRE?

Sure, gimme a bit.

robertmaxton42 commented 6 years ago

... Okay, after experimenting, it was actually an error on my end, though it's very weird. If you forget to specify a keyword in .compile, like .compile(thr, [-std=c++11]), for some reason compile interprets that as "use fast math."

So, nevermind, I guess...

fjarri commented 6 years ago

Are you literally writing .compile(thr, ["-std=c++11"])? compiler_options is a keyword parameter that goes after fast_math, so this call will be interpreted as fast_math=["-std=c++11"], which evaluates to True. But then if you forget to specify it, fast_math should be False...

robertmaxton42 commented 6 years ago

Ah, I was looking at the wrong documentation. Yes, fast_math is the second positional argument and a non-empty list is truthy. That makes sense. Whoops >.>. Sorry about that.

robertmaxton42 commented 6 years ago

Coming back to this - I've run into a more interesting use case, namely the entirety of existing CUDA library code. In particular, there's a number of cuBLAS and cuSPARSE functions I'd like to be able to use in my code. Planning arbitrary functions would let us use ctypes and similar to invoke those in planned computations - and similarly, would let us define and call __host__ functions from Python in full generality.