Jutho / TensorOperations.jl

Julia package for tensor contractions and related operations
https://jutho.github.io/TensorOperations.jl/stable/
Other
452 stars 56 forks source link

Alternative Allocators #182

Closed lkdvos closed 2 months ago

lkdvos commented 3 months ago

This PR attempts to add some additional allocator strategies to the toolbox, focused around dense arrays of isbits types.

I added some rudimentary support for PtrArrays.jl, which provides a manual way of implementing malloc and free for the temporaries. My first tests seem to indicate however that this does not improve the performance (even making it slightly worse in most cases). This probably requires more investigation, as it seems unlikely that this should be happening.

I also added support for Bumper.jl. Here, I use their buffer types as an allocator, which means that it is quite easy to manually make use of the bumper interface as follows:

buf = Bumper.default_buffer()
@no_escape buf begin
    @tensor allocator=buf tensorexpr...
end

Nevertheless, for further automation, I also added the convenience @butensor macro, which does exactly that.


Some implementation notes:

In order to make this work, the current way of dispatching with StridedViews and choosing between GPU and CPU definitely does not work. Both these options require a parent type of the StridedView which is not Array (but is DenseArray!), which would now be unsupported. I could add manual select_backend procedures for this, but I am a bit scared of the ambiguities, as I don't want to have to deal with the many combinations. This should probably be reconsidered.

In principle the current implementation of the Bumper methods could be part of a package extension, as it does not even require a definition of an Allocator type. Is this something we would like?

We should probably invest some time in a proper benchmark suite, as it is quite hard to gauge the effectiveness of these methods.

lkdvos commented 2 months ago

Small note to myself as well: I think there are some tensoralloc calls that don't have a matching free statement in the base implementations. I'll try to fix this in this PR

lkdvos commented 2 months ago

I moved the Bumper implementation to a package extension and found a way to work around defining the macro in the base package, while the implementation is in the extension. I think the main missing ingredient now is just a couple tests, after which I think this could be ready to go.

Jutho commented 2 months ago

Looks great. Maybe we can use the same macro in extension package technique for @cutensor?

lkdvos commented 2 months ago

I think that's definitely a good idea, but I would move that to a separate PR or commit ;)

Jutho commented 2 months ago

Ok, if tests work I think this completes this PR.