Closed robertmaxton42 closed 6 years ago
(Possibly relevant: I'm running this on a rather old 755M. CC 3.0.)
Note that you are passing a numpy
array to tr
as the second argument:
tr(out, arr)
If you replace it with arrgpu
, the error disappears. I am not sure what is causing the underlying error though.
... Okay, on the one hand, that's a silly whoops on my part, apologies. On the other, that's a really weird way for it to go wrong, though.
So, I looked into it, and the reason is that a numpy
array as an argument to a kernel results in its whole contents (that is, not a pointer) being attached to the argument list (see driver.py:_build_arg_buf()
in PyCUDA). Since the array is pretty large, it results in the error from CUDA.
Consider the code:
gets me a nice big red box that ends with
LaunchError: cuLaunchKernel failed: too many resources requested for launch
. Checkingouttype
in a separate cell givesType(uint8, shape=(8, 7, 8), strides=(70, 10, 1), offset=71, nbytes=700)
so clearly the system isn't actually running out of memory or something. Googling the error leads me to guess that
Transpose
is asking for too many threads per block internally, but I can't be sure without better familiarity with the internals...Thanks for all the help!