Open robertmaxton42 opened 6 years ago
Hm, I agree that it would be useful. The question is the specific API though. I suppose that function will need a Thread
object added to its argument list in runtime. Also, perhaps some kind of (optional) conversion of numpy arrays to GPU arrays and back? Those functions that you want to call, what do they take as arguments, and what do they return? Just to have some concrete example.
(By the way, while kernel_call says that it defaults to fast_math=False, for some reason whenever I have a C++ error the nvcc command has the -use-fast-math flag set...)
That's weird, I tried a simple example, and it doesn't happen for me. Can you make a MRE?
NVTX, while the idea I had in mind, may be a bad general example - since they're really just debugger flags, they take very simple inputs (a c-typed string) and return nothing. If I were doing a Computation "by hand", instead of using a plan, I'd just do
nvtx = ctypes.CDLL("/usr/local/cuda-9.0/lib64/libnvToolsExt.so.1.0.0")
at the top and then call
nvtx.nvtxRangePushA(ctypes.c_char_p(b"scan"))
<some Computation>(arr, ...)
nvtx.nvtxRangePop()
and then the duration of the <some Computation>
shows up in the Visual Profiler as a side effect.
That's weird, I tried a simple example, and it doesn't happen for me. Can you make a MRE?
Sure, gimme a bit.
... Okay, after experimenting, it was actually an error on my end, though it's very weird. If you forget to specify a keyword in .compile
, like .compile(thr, [-std=c++11])
, for some reason compile
interprets that as "use fast math."
So, nevermind, I guess...
Are you literally writing .compile(thr, ["-std=c++11"])
? compiler_options
is a keyword parameter that goes after fast_math
, so this call will be interpreted as fast_math=["-std=c++11"]
, which evaluates to True
. But then if you forget to specify it, fast_math
should be False
...
Ah, I was looking at the wrong documentation. Yes, fast_math
is the second positional argument and a non-empty list is truthy. That makes sense. Whoops >.>. Sorry about that.
Coming back to this - I've run into a more interesting use case, namely the entirety of existing CUDA library code. In particular, there's a number of cuBLAS and cuSPARSE functions I'd like to be able to use in my code. Planning arbitrary functions would let us use ctypes and similar to invoke those in planned computations - and similarly, would let us define and call __host__
functions from Python in full generality.
It'd be nice if we could add a scheduled call to a function, to be executed only at runtime, to a plan. The particular use case I have in mind is using NVTX events and ranges, but now that it occurs to me it seems like something that might be generally useful.
(By the way, while
kernel_call
says that it defaults tofast_math=False
, for some reason whenever I have a C++ error thenvcc
command has the-use-fast-math
flag set...)-- Actually, related to that last, it'd be cool if we could require at planning time that our kernel be compiled with particular options. In particular,
-std=c++11
. I strongly suspect it would cut down on usage errors in that case in particular.