allow switching backends

For supporting different backends

using NDTensors, BenchmarkTools, TropicalNumbers
using ITensors

i = Index(1000)
j = Index(1000)
k = Index(1000)
t1 = ITensors.ITensor(TropicalF64, i, j)
t2 = ITensors.ITensor(TropicalF64, j, k)

using TropicalGEMM, Octavian
using LinearAlgebra: BlasFloat

@inline function NDTensors.auto_select_backend(::Type{<:StridedVecOrMat{<:Tropical{<:BlasFloat}}}, ::Type{<:StridedVecOrMat{<:Tropical{<:BlasFloat}}}, ::Type{<:StridedVecOrMat{<:Tropical{BlasFloat}}})
    Val(:Octavian)
end

function NDTensors._gemm!(::Val{:Octavian}, tA, tB, alpha,
                A::AbstractVecOrMat,
                B::AbstractVecOrMat,
                beta, C::AbstractVecOrMat)
    if tA == 'T'
        A = transpose(A)
    end
    if tB == 'T'
        B = transpose(B)
    end
    Octavian.matmul!(C, A, B, alpha, beta, C)
end

@benchmark t1 * t2

A note on the benchmark results

Before using TropicalGEMM,


julia> @benchmark t1 * t2
BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  66
  --------------
  minimum time:     2.287 s (0.00% GC)
  median time:      2.288 s (0.00% GC)
  mean time:        2.288 s (0.00% GC)
  maximum time:     2.288 s (0.00% GC)
  --------------
  samples:          3
  evals/sample:     1

After using TropicalGEMM,

julia> @benchmark t1 * t2
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  60
  --------------
  minimum time:     67.562 ms (0.00% GC)
  median time:      67.964 ms (0.00% GC)
  mean time:        68.596 ms (0.25% GC)
  maximum time:     72.188 ms (1.34% GC)
  --------------
  samples:          73
  evals/sample:     1

A note on the overheads

julia> backend_generic()
:Generic

julia> @benchmark NDTensors._gemm!('N', 'N', 1.0, $a, $b, 0.0, $c)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     302.398 ns (0.00% GC)
  median time:      306.681 ns (0.00% GC)
  mean time:        314.784 ns (0.92% GC)
  maximum time:     6.286 μs (94.60% GC)
  --------------
  samples:          10000
  evals/sample:     251

julia> backend_auto()
:Auto

julia> @benchmark NDTensors._gemm!('N', 'N', 1.0, $a, $b, 0.0, $c)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     112.454 ns (0.00% GC)
  median time:      113.455 ns (0.00% GC)
  mean time:        114.501 ns (0.00% GC)
  maximum time:     187.419 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     923

Hi Jin-Guo,

Thanks for the PR, that is an interesting interface. I like that it is extensible externally and I like the auto_select_backend function for automatically choosing the backend with dispatch, that will be very useful for switching between backends that only support certain field types. It would be nice to have a similar interface at the level of the contract! function for switching between contraction backends that work above the gemm! level, like TBLIS (which is only performant for real floating point contractions and tensors with large dimensions/order).

I suppose I was thinking about an interface similar to what we are using for TBLIS based on Requires.jl where if TBLIS is loaded then you can enable that backend. Maybe we can also use a strategy like that where we internally put that Octavian _gemm! overload in a file that only gets included through Requires when Octavian is loaded? It would be unfortunate for people to have to write that code themselves (at least for the specific case of Octavian, which hopefully becomes pretty standard). Maybe an alternative strategy to using Requires would be to have very lightweight "add-on" packages for ITensors/NDTensors which only includes definitions like your _gemm!(::Val{Octavian}, ...) overload. For example packages called ITensorOctavian or ITensorTBLIS which only contain the _gemm! or contract! overload for that backend.

Is that overhead related to the dynamic construction of Val in this line: Val(gemm_backend[])? I guess that is unavoidable if you want the backend to be externally extendable, but at least through the "automatic" backend selection that overhead isn't there as long as auto_select_backend is written in a type stable way.

Note that in older versions of Julia, that kind of dynamic construction is very slow: https://github.com/JuliaLang/julia/issues/36384. We could create a specialized Val-like type called MatMulBackend:

@eval struct MatMulBackend{T}
  (f::Type{<:MatMulBackend})() = $(Expr(:new, :f))
end
MatMulBackend(s) = MatMulBackend{Symbol(s)}()
macro MatMulBackend_str(s)
    :(MatMulBackend{$(Expr(:quote, Symbol(s)))})
end

That might be a nicer interface anyway. Also, I like the interface we have for SiteType/OpName which is used in a very similar way for allowing external overloads of the op function. In that case, we define a macro constructor: https://github.com/ITensor/ITensors.jl/blob/master/src/physics/sitetype.jl#L153-L159 so you can use the string macro syntax MatMulBackend"Octavian". It's just like the MIME type in Julia: https://github.com/JuliaLang/julia/blob/v1.6.0/base/multimedia.jl#L16-L51.

Finally, we probably want a simpler interface for the _gemm! function to make this overloading nicers. There shouldn't be a need for people to have the lines:

    if tA == 'T'
        A = transpose(A)
    end
    if tB == 'T'
        B = transpose(B)
    end

in every overload. We could also follow Octavian and call the function NDTensors.matmul! (which we could leave unexported since it would only be used for overloading), and have it accept Symmetric/Hermitian types instead of tA/tB.

Finally, a small interface thing, it would be better to rename NDTensors.auto_select_backend to something like NDTensors.auto_select_matmul_backend to allow for a future function NDTensors.auto_select_contract_backend. I also think it would be better to have a more general function like set_matmul_backend(s::String) instead of the hard-coded functions backend_auto(). I can never decide if functions like that should be set_matmul_backend(s) or set_matmul_backend!(s)... We have also generally preferred using Strings instead of Symbols whenever possible in the external interface since they are more familiar to a wider audience, and I don't think there is any advantage to using a Symbol in this case, at least in the interface set_matmul_backend("auto") or set_matmul_backend("blas"). We can use Symbols internally for efficiency.

It would be nice to have a similar interface at the level of the contract! function for switching between contraction backends

Agree, let's do it later.

overload in a file that only gets included through Requires when Octavian is loaded?

Sounds good. Added.

Is that overhead related to the dynamic construction of Val in this line: Val(gemm_backend[])? I guess that is unavoidable if you want the backend to be externally extendable, but at least through the "automatic" backend selection that overhead isn't there as long as auto_select_backend is written in a type stable way.

Exactly.

specialized Val-like type called MatMulBackend

Sounds good, I used GemmBackend to make the naming more consistent.

Finally, we probably want a simpler interface for the _gemm! function to make this overloading nicers. There shouldn't be a need for people to have the lines:

We can not avoid these lines if we call it gemm. To make it less terrifying, I made it look shorter.

Finally, a small interface thing, it would be better to rename NDTensors.auto_select_backend to something like NDTensors.auto_select_matmul_backend to allow for a future function NDTensors.auto_select_contract_backend. I also think it would be better to have a more general function like set_matmul_backend(s::String) instead of the hard-coded functions backend_auto(). I can never decide if functions like that should be set_matmul_backend(s) or set_matmul_backend!(s)... We have also generally preferred using Strings instead of Symbols whenever possible in the external interface since they are more familiar to a wider audience, and I don't think there is any advantage to using a Symbol in this case, at least in the interface set_matmul_backend("auto") or set_matmul_backend("blas"). We can use Symbols internally for efficiency.

We can do it whenever we need it. A deprecation warning will help people migrate.

A general comment

I feel GEMM is a lower level interface, while matmul! and contract! are higher level interfaces. This is why I think GEMM is better name for backends. It is wield to specify both matmul and contract! methods, because contract! includes matmul!. Anyway, feel free to change.

I feel GEMM is a lower level interface, while matmul! and contract! are higher level interfaces. This is why I think GEMM is better name for backends. It is wield to specify both matmul and contract! methods, because contract! includes matmul!. Anyway, feel free to change.

My point with preferring matmul! over gemm! is that they are both just as general interfaces for the same operations, but the Julia style of matmul!/mul! is much cleaner and simpler. So a goal would be to just use matmul!/mul! everywhere anyway. I don't think there was a point of calling gemm! directly in contract!, that was just historical and we could just as easily call Julia's 5-arg mul! as you changed in the generic case in this PR.

Regarding overloading contract!, there are certain contraction backends like TBLIS or GETT that never call a gemm, and instead perform tensor contractions directly without any intermediate permutations. In that case, you would have to overload contract! directly since it wouldn't even reach a gemm call. Even besides that point, NDTensors.auto_select_backend is just a very general name and it wouldn't be clear what operation's backend it was referring to (for example, we also have a variety of SVD backends, it isn't clear that this isn't controlling that as well).

Anyway, we can merge this as is and I'm happy to clean it up, but I prefer to get the interface as good as possible the first time to avoid the annoyance of deprecations, when possible.

we can merge this as is and I'm happy to clean it up

Thanks, polishing the interface consistently definitely requires someone who understand the package very well. Sorry for not being able to help much.

I'll merge this and treat it as an experimental feature for now. Interface is subject to change.

ITensor / NDTensors.jl