ITensor / NDTensors.jl

A Julia package for n-dimensional sparse tensors.
Apache License 2.0
27 stars 7 forks source link

TBLIS backend support #48

Closed mtfishman closed 3 years ago

mtfishman commented 3 years ago

This adds support for using TBLIS as a backend for (real) tensor contractions. Here is the interface:

using BenchmarkTools
using LinearAlgebra
using NDTensors
using TBLIS

d = 25

# Set the TBLIS and BLAS threads
nthreads = 4
TBLIS.set_num_threads(nthreads)
BLAS.set_num_threads(nthreads)

A = randomTensor(d, d, d, d);
B = randomTensor(d, d, d, d);
C = randomTensor(d, d, d, d);

labelsA = (1, -1, 2, -2)
labelsB = (-2, 4, 3, -1)
labelsC = (1, 2, 3, 4)

# Use TBLIS (on by default from `using TBLIS` command)
println("Benchmark TBLIS")
C_tblis = copy(C);
@btime NDTensors.contract!($C_tblis, $labelsC, $A, $labelsA, $B, $labelsB) samples = 100;

# Use BLAS
println("Benchmark BLAS")
disable_tblis!()
C_blas = copy(C);
@btime NDTensors.contract!($C_blas, $labelsC, $A, $labelsA, $B, $labelsB) samples = 100;

println()
@show C_blas ≈ C_tblis

# Can enable TBLIS again
enable_tblis!()

This outputs something like the following on my laptop:

julia> include("/tmp/tblis.jl")
nt: 6
Benchmark TBLIS
  2.639 ms (28 allocations: 1.89 KiB)
Benchmark BLAS
  3.589 ms (38 allocations: 8.94 MiB)

C_blas ≈ C_tblis = true

so it is about 25% faster and uses a lot less memory.

In order to use it, you first have to install TBLIS.jl (and right now, to get commands like TBLIS.set_num_threads(4), you need to use my fork: https://github.com/mtfishman/TBLIS.jl).

Note that it is only limited to real contractions right now, since the TBLIS support for complex tensors is currently limited (https://github.com/devinamatthews/tblis/issues/18). I've also found that it doesn't help with the DMRG calculations I have tried (specifically, it doesn't give a speedup, and sometimes is even a bit slower). My guess is that many DMRG contractions can map directly to BLAS calls without permutations, and in those cases TBLIS is slower than MKL. It may also have to do with the tensor dimensions involved.