Closed robertmaxton42 closed 6 years ago
Custom offsets/strides are kind of an experimental feature, and not all built-in computations support them (partially because I have not decided on a uniform way of specifying them).
In your code the correct line would be the first one, because it requests a temporary array with the offset equal to that of the output
parameter of Scan
, as it should. Since you're scanning over an outer axis, Scan
calls three kernels - transpose, scan itself, and transpose again. The Transpose
computation only takes the input array as a parameter, and its expected output array is generated based only on the input's shape and dtype. Thus the second transpose expects something with offset=0
as an output, but it gets passed an array with offset!=0
in the _build_plan()
method of Scan
(because Scan
uses exactly the same array as the input and the output, including the strides and the offset), and that's where the error is thrown.
Now this part is easy to fix (I will push a commit shortly). The second problem is that GPUArray
does not support custom offsets. There seems to be a way to still do it, by passing a modified memory pointer to it, but I am not 100% sure it won't cause problems somewhere else; I need to test it. If your code can be safely switched to OpenCL, you can test offsets there.
BTW, in your Dummy
computation, you, perhaps, meant to write
${k_out.store_idx}(i, j, ${k_arr.load_idx}(i, j));
Actually it seems that supporting offsets in CUDA arrays is easier than I expected. Could you check if your code works now?
Indeed I did. And... yup, it works now! Thanks!
No problem, closing the issue then.
Hopefully a more intelligent question this time!
When called in a plan on an array with a nonzero offset, Scan gives inconsistent requirements for its output array. If given an output array with the same offset, it fails at
plan.computation_call(transpose_from, output, transposed_scanned)
, which apparently expects offset=0; if given an output with zero offset, then it fails atargnames = self._process_computation_arguments(signature, args, kwds)
, where it's expecting an offset equal to the original array's.A minimal reproduction follows: