Currently rolling functions return an array on the cpu no matter the type of the input array. This means that the result must be moved back to the gpu after using it, which is a slow operation.
I guess this must be possible since the base diff function (equivalent to rolling(-,a,2)) supports CuArrays. But this may require good knowledge of CUDA.jl.
Currently rolling functions return an array on the cpu no matter the type of the input array. This means that the result must be moved back to the gpu after using it, which is a slow operation.
I guess this must be possible since the base
diff
function (equivalent torolling(-,a,2)
) supports CuArrays. But this may require good knowledge of CUDA.jl.