Closed lkdvos closed 5 months ago
Note that the current implementation changes the CuTensorDescriptor function quite drastically, in order to expose the internal constructor.
It's simple enough of a change that we should carry this to enable better integration with external packages.
simply changing
alignment=sizeof(eltype(A))
That's an underestimation, which could hurt performance. It is probably also better to use Base.datatype_alignment
for the minimal guaranteed alignment. However, for performance it's better to track the alignment of the buffer from where it was allocated, or determine it at run time. CUDA.jl doesn't currently do this either, which could result in misaligned inputs since we hard-code an alignment of 128 bytes right now (e.g., when using a view and passing it to cuTENSOR).
Let me know if you still want to make changes to this PR, otherwise I'd be OK with merging it as is.
For me this is good to go, I am going to have to experiment with the alignment anyways for the StridedView implementation, but this is now accessible through the CuTensorDescriptor
constructor so that should work.
Thanks!
This PR migrates the type restrictions for the cutensor operations from the definitions of
*operation*_execute!
andplan_*operation*
to the constructor of theCuTensorDescriptor
. This implementation makes it such that custom types, which wrapCuArray
objects can specialize just theCuTensorDescriptor
constructor while still being able to re-use most of the codebase.Some questions/remarks I am still having:
Note that the current implementation changes the
CuTensorDescriptor
function quite drastically, in order to expose the internal constructor. I could think of alternative ways to achieve this goal as well by renaming the inner constructor to_CuTensorDescriptor
, which handles only the finalizer etc.This change allows cuTENSOR to work almost trivially with
StridedCuArray
s as well. I played around with this a bit, and the only issue that shows up by changingDenseCuArray
toStridedCuArray
is that the data alignment can no longer be guaranteed. I have to admit that I do not have enough experience with this myself, but simply changingalignment=sizeof(eltype(A))
seems to work. Does anyone know if there are any issues associated with this approach? Should I be mindful of severe performance pitfalls? I have not added this because of these reasons, but if anyone can reassure me that this is fine, I can open up a different PR for implementingStridedCuArray
as well.Any suggestions are definitely more than welcome.