Closed lkdvos closed 2 months ago
Small note to myself as well: I think there are some tensoralloc
calls that don't have a matching free
statement in the base implementations. I'll try to fix this in this PR
I moved the Bumper implementation to a package extension and found a way to work around defining the macro in the base package, while the implementation is in the extension. I think the main missing ingredient now is just a couple tests, after which I think this could be ready to go.
Looks great. Maybe we can use the same macro in extension package technique for @cutensor
?
I think that's definitely a good idea, but I would move that to a separate PR or commit ;)
Ok, if tests work I think this completes this PR.
This PR attempts to add some additional allocator strategies to the toolbox, focused around dense arrays of isbits types.
I added some rudimentary support for PtrArrays.jl, which provides a manual way of implementing
malloc
andfree
for the temporaries. My first tests seem to indicate however that this does not improve the performance (even making it slightly worse in most cases). This probably requires more investigation, as it seems unlikely that this should be happening.I also added support for Bumper.jl. Here, I use their buffer types as an allocator, which means that it is quite easy to manually make use of the bumper interface as follows:
Nevertheless, for further automation, I also added the convenience
@butensor
macro, which does exactly that.Some implementation notes:
In order to make this work, the current way of dispatching with
StridedView
s and choosing between GPU and CPU definitely does not work. Both these options require a parent type of the StridedView which is notArray
(but isDenseArray
!), which would now be unsupported. I could add manualselect_backend
procedures for this, but I am a bit scared of the ambiguities, as I don't want to have to deal with the many combinations. This should probably be reconsidered.In principle the current implementation of the Bumper methods could be part of a package extension, as it does not even require a definition of an
Allocator
type. Is this something we would like?We should probably invest some time in a proper benchmark suite, as it is quite hard to gauge the effectiveness of these methods.