ITensor / ITensors.jl

A Julia library for efficient tensor computations and tensor network calculations
https://itensor.org
Apache License 2.0
521 stars 121 forks source link

[ITensors] [BUG] Some autodiff errors / missing rules #815

Closed DomCRose closed 2 years ago

DomCRose commented 2 years ago

Description of bug Hi, great work on this library! I found a few issues with the autodiff support while experimenting. I realise this is still a bit in development, but hopefully this helps!

  1. Some rules (e.g. replaceind) appear to assume indicies arguments are provided as pairs, rather than as seperate arguments. Seems like it thinks it has a chain rule for the two-argument function, which is then calling the inv_op defined only for the pair argument form. Looks like this might affect a few functions, e.g. replaceind.
  2. Differentiation of ITensor construction appears not to work when indices are provided as a vector. Not clear to me from the code why this one doesn't work. Solved by generalizing ITensor to Array conversion, see: #818
  3. Defining two order 2 ITensors then taking their trace product appears to produce the transpose of the expected gradient for the input matrices. Perhaps a misunderstanding on my end? Solved by fixing rrule to use correct index order, see: #818
  4. A slightly more complex problem (trace over a transformation from a small parameterized unitary circuit, see a sketched simplified case below) seems to cause a stack overflow error. I noticed the multi-argument contract chain rule explicitly splits the environment construction into left and right parts. While this would be fine for an MPS (assuming a correctly ordered input), perhaps for something a bit more loopy like this its causing issues? Could this be improved for the generic TN contraction case by simply having a single environment of all tensors but 1 and asking the optimal contraction order calculation to figure out how best to contract it? Solved by adding changes to enable autodiff of arbitrary TN contractions, see: #878

How to reproduce Examples in the code below.

Expected behavior Expected correct gradients.

Actual behavior Error 1:

ERROR: LoadError: Trying to differentiate `replaceind` but the inverse of the operation (`inv_op`) `replaceind` with arguments (ITensor ord=2
Dim 1: (dim=2|id=876)
Dim 2: (dim=2|id=832)
NDTensors.Diag{Float64, Float64}
 2×2
 1.0  0.0
 0.0  1.0, (dim=2|id=772), (dim=2|id=876)) and keyword arguments Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}() is not defined.
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] inv_op(::Function, ::ITensor, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ ITensors.ITensorChainRules C:\Users\domin\.julia\packages\ITensors\9KZMT\src\ITensorChainRules\ITensorChainRules.jl:40
  [3] inv_op(::Function, ::ITensor, ::Index{Int64}, ::Index{Int64})
    @ ITensors.ITensorChainRules C:\Users\domin\.julia\packages\ITensors\9KZMT\src\ITensorChainRules\ITensorChainRules.jl:40
  [4] (::ITensors.ITensorChainRules.var"#f_pullback#39"{Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, typeof(replaceind), ITensor, Tuple{Index{Int64}, Index{Int64}}})(ȳ::ITensor)
    @ ITensors.ITensorChainRules C:\Users\domin\.julia\packages\ITensors\9KZMT\src\ITensorChainRules\ITensorChainRules.jl:109
  [5] (::Zygote.ZBack{ITensors.ITensorChainRules.var"#f_pullback#39"{Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, typeof(replaceind), ITensor, Tuple{Index{Int64}, Index{Int64}}}})(dy::ITensor)
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\chainrules.jl:204
  [6] Pullback
    @ c:\Users\domin\Dropbox\code_practice\julia\itensor\chainrulesissues.jl:13 [inlined]
  [7] (::typeof(∂(replace_two_args_trace)))(Δ::Float64)
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
  [8] Pullback
    @ c:\Users\domin\Dropbox\code_practice\julia\itensor\chainrulesissues.jl:23 [inlined]
  [9] (::Zygote.var"#55#56"{typeof(∂(#37))})(Δ::Float64)
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface.jl:41
 [10] gradient(f::Function, args::ITensor)
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface.jl:76
 [11] top-level scope
    @ c:\Users\domin\Dropbox\code_practice\julia\itensor\chainrulesissues.jl:23

Error 2:

ERROR: LoadError: type Nothing has no field method
Stacktrace:
  [1] getproperty(x::Nothing, f::Symbol)
    @ Base .\Base.jl:33
  [2] matching_cr_sig(t::IRTools.Inner.Meta, s::Nothing)
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\chainrules.jl:72
  [3] has_chain_rrule(T::Type)
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\chainrules.jl:54
  [4] #s3053#1210
    @ C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:20 [inlined]
  [5] var"#s3053#1210"(::Any, ctx::Any, f::Any, args::Any)
    @ Zygote .\none:0
  [6] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any, N} where N)
    @ Core .\boot.jl:571
  [7] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:391 [inlined]
  [8] _pullback(::Zygote.Context, ::Type{Matrix{Float64}}, ::NDTensors.NeverAlias, ::Matrix{Float64})        
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
  [9] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:396 [inlined]
 [10] _pullback(::Zygote.Context, ::Type{Array{Float64, N} where N}, ::NDTensors.NeverAlias, ::Matrix{Float64})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [11] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:460 [inlined]
 [12] _pullback(::Zygote.Context, ::ITensors.var"##ITensor#125", ::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::Type{ITensor}, ::NDTensors.NeverAlias, ::Type{Float64}, ::Matrix{Float64}, ::Vector{Index{Int64}})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [13] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:455 [inlined]
 [14] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:497 [inlined]
 [15] _pullback(::Zygote.Context, ::ITensors.var"##ITensor#131", ::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::Type{ITensor}, ::NDTensors.NeverAlias, ::Matrix{Float64}, ::Vector{Index{Int64}})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [16] _apply(::Function, ::Vararg{Any, N} where N)
    @ Core .\boot.jl:804
 [17] adjoint
    @ C:\Users\domin\.julia\packages\Zygote\FPUm3\src\lib\lib.jl:200 [inlined]
 [18] _pullback
    @ C:\Users\domin\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:65 [inlined]
 [19] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:497 [inlined]
 [20] _pullback(::Zygote.Context, ::Type{ITensor}, ::NDTensors.NeverAlias, ::Matrix{Float64}, ::Vector{Index{Int64}})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [21] _apply(::Function, ::Vararg{Any, N} where N)
    @ Core .\boot.jl:804
 [22] adjoint
    @ C:\Users\domin\.julia\packages\Zygote\FPUm3\src\lib\lib.jl:200 [inlined]
 [23] _pullback
    @ C:\Users\domin\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:65 [inlined]
 [24] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:507 [inlined]
 [25] _pullback(::Zygote.Context, ::ITensors.var"##ITensor#133", ::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::Type{ITensor}, ::Matrix{Float64}, ::Vector{Index{Int64}})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [26] _apply(::Function, ::Vararg{Any, N} where N)
    @ Core .\boot.jl:804
 [27] adjoint
    @ C:\Users\domin\.julia\packages\Zygote\FPUm3\src\lib\lib.jl:200 [inlined]
 [28] _pullback
    @ C:\Users\domin\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:65 [inlined]
 [29] _pullback
    @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:507 [inlined]
 [30] _pullback(::Zygote.Context, ::Type{ITensor}, ::Matrix{Float64}, ::Vector{Index{Int64}})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [31] _pullback
    @ c:\Users\domin\Dropbox\code_practice\julia\itensor\chainrulesissues.jl:29 [inlined]
 [32] _pullback(::Zygote.Context, ::typeof(construct_and_trace_vector), ::Matrix{Float64}, ::Index{Int64}, ::Index{Int64})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [33] _pullback
    @ c:\Users\domin\Dropbox\code_practice\julia\itensor\chainrulesissues.jl:44 [inlined]
 [34] _pullback(ctx::Zygote.Context, f::var"#41#42", args::Matrix{Float64})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [35] _pullback(f::Function, args::Matrix{Float64})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface.jl:34
 [36] pullback(f::Function, args::Matrix{Float64})
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface.jl:40
 [37] gradient(::Function, ::Matrix{Float64}, ::Vararg{Any, N} where N)
    @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface.jl:75
 [38] top-level scope
    @ c:\Users\domin\Dropbox\code_practice\julia\itensor\chainrulesissues.jl:44

Error 4:

ERROR: LoadError: StackOverflowError:
Stacktrace:
     [1] Array
       @ .\boot.jl:448 [inlined]
     [2] getindex
       @ .\array.jl:397 [inlined]
     [3] optimal_contraction_sequence
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1824 [inlined]
     [4] optimal_contraction_sequence
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1814 [inlined]
     [5] contract(As::Tuple{ITensor}; sequence::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1897
     [6] contract
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1892 [inlined]
     [7] #contract#224
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1903 [inlined]
     [8] contract(As::ITensor)
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1903
     [9] _contract(As::Tuple{ITensor}, sequence::Vector{Any}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1910
    [10] _contract
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1910 [inlined]
--- the last 6 lines are repeated 5659 more times ---
 [33965] contract(As::Tuple{ITensor}; sequence::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1897
 [33966] contract
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1892 [inlined]
 [33967] #contract#224
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1903 [inlined]
 [33968] contract(As::ITensor)
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1903
 [33969] _contract(As::Vector{ITensor}, sequence::Vector{Any}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1910
 [33970] _contract(As::Vector{ITensor}, sequence::Vector{Any})
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1910
 [33971] contract(As::Vector{ITensor}; sequence::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
       @ ITensors C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1897
 [33972] contract
       @ C:\Users\domin\.julia\packages\ITensors\9KZMT\src\itensor.jl:1892 [inlined]
 [33973] (::ITensors.ITensorChainRules.var"#contract_pullback#49"{ITensor, ITensor, Tuple{ITensor}})(ȳ::ITensor)
       @ ITensors.ITensorChainRules C:\Users\domin\.julia\packages\ITensors\9KZMT\src\ITensorChainRules\ITensorChainRules.jl:175
 [33974] (::Zygote.ZBack{ITensors.ITensorChainRules.var"#contract_pullback#49"{ITensor, ITensor, Tuple{ITensor}}})(dy::ITensor)
       @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\chainrules.jl:204
 [33975] Pullback
       @ c:\Users\domin\Dropbox\code_practice\julia\itensor\chainrulesissues.jl:79 [inlined]
 [33976] (::typeof(∂(circuit_transform)))(Δ::Float64)
       @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface2.jl:0
 [33977] (::Zygote.var"#55#56"{typeof(∂(circuit_transform))})(Δ::Float64)
       @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface.jl:41
 [33978] gradient(::Function, ::Matrix{Float64}, ::Vararg{Any, N} where N)
       @ Zygote C:\Users\domin\.julia\packages\Zygote\FPUm3\src\compiler\interface.jl:76

Code demonstrating bug

using ITensors
using Zygote
using LinearAlgebra

### Code for error 1. ###

function replace_pair_trace(A, i)
    A = replaceind(A, ind(A, 2) => i)
    return (A*delta(i, ind(A, 1)))[1]
end

function replace_two_args_trace(A, i)
    A = replaceind(A, ind(A, 2), i)
    return (A*delta(i, ind(A, 1)))[1]
end

A = ITensor(randn(2, 2), Index(2), Index(2))
i = Index(2)

replace_pair_trace(A, i) == replace_two_args_trace(A, i)
gradient(a -> replace_pair_trace(a, i), A)
# Line below: no inverse of replace ind with these arguments defined
gradient(a -> replace_two_args_trace(a, i), A)

### Code for error 2. ###

function construct_and_trace_vector(A, i, j)
    AT = ITensor(A, [i, j])
    return (AT*delta(i, j))[1]
end

function construct_and_trace_vararg(A, i, j)
    AT = ITensor(A, i, j)
    return (AT*delta(i, j))[1]
end

A = randn(2, 2)
i = Index(2)
j = Index(2)

construct_and_trace_vector(A, i, j) == construct_and_trace_vararg(A, i, j)
gradient(a -> construct_and_trace_vararg(a, i, j), A)
gradient(a -> construct_and_trace_vector(a, i, j), A) # type Nothing has no field method

### Code for error 3. ###

f(A, B) = tr(A * B)

function tracemul_itensor(A, B, i, j)
    AT = ITensor(A, i, j)
    BT = ITensor(B, j, i)
    return (BT*AT)[1]
end

A = rand(2, 2)
B = rand(2, 2)
i = Index(2)
j = Index(2)

f(A, B) == tracemul_itensor(A, B, i, j)
gradient(f, A, B)
gradient((a, b) -> tracemul_itensor(a, b, i, j), A, B) # produces transposed gradient

### Code for error 4. ###

ITensors.enable_contraction_sequence_optimization()

function circuit_transform(O, U)
    i = Index(2)
    j = Index(2)
    k = Index(2)
    l = Index(2)
    OT = ITensor(O, j, j')
    U11 = ITensor(U, i, j, k, l, i'', j'', k'', l'')
    U11dagger = ITensor(U, i'', j'', k'', l'', i, j', k, l)
    out = OT * U11 * U11dagger
    return out[1]
end

U = randn(2^4, 2^4)
O = [1.0 0.0; 0.0 -1.0]
circuit_transform(O, U)
gradient(circuit_transform, O, U) # stack overflow

Version information

mtfishman commented 2 years ago

Thanks for the great report @DomCRose! Not surprised that bugs like this are popping up.

Hopefully for your use cases you can work around these for now, but they should be relatively small fixes.

The most concerning one is definitely the tracemul_itensor example. If you use itensor (the view version of ITensor), it works:

using ITensors
using Zygote
using LinearAlgebra
using Random

f(A, B) = tr(A * B)

function tracemul_itensor(A, B, i, j)
    AT = itensor(A, i, j)
    BT = itensor(B, j, i)
    return (BT*AT)[1]
end

Random.seed!(1234)
A = rand(2, 2)
B = rand(2, 2)
i = Index(2)
j = Index(2)

f(A, B) == tracemul_itensor(A, B, i, j)
gradient(f, A, B)
gradient((a, b) -> tracemul_itensor(a, b, i, j), A, B) # produces transposed gradient

outputs:

julia> gradient(f, A, B)
([0.35311164439921205 0.39425536741585077; 0.9531246272848422 0.7955469475347194], [0.32597672886359486 0.5490511363155669; 0.21858665481883066 0.8942454282009883])

julia> gradient((a, b) -> tracemul_itensor(a, b, i, j), A, B)
([0.35311164439921205 0.39425536741585077; 0.9531246272848422 0.7955469475347194], [0.32597672886359486 0.5490511363155669; 0.21858665481883066 0.8942454282009883])

We fixed a similar bug in the rrule for itensor in #773 but the fix didn't make it over to the ITensor constructor.

Would you mind making a PR adding the same fix for the ITensor rrule?

mtfishman commented 2 years ago

Also thanks for making use of the issue template. I see that some of the template is a bit awkward, I'll rework it. We could simplify it to:

No need for both How to reproduce and Code demonstrating bug.

DomCRose commented 2 years ago

Can do!

Re. the second error, I believe the issue may stem a) from it not finding an rrule when the Index argument is a vector and b) from Array not having an argument form which receives an ITensor and Vector of Index, instead it only has a vararg form.

I've found a couple of ways to fix this.

New Array method and rrule edit

function Array(x, a::Vector{T}) where {T<:Index}
  return Array(x, a...)
end

function ChainRulesCore.rrule(::typeof(ITensor), x::Array{<:Number}, a...)
  y = ITensor(x, a...)
  function ITensor_pullback(ȳ)
    # TODO: define `Array(::ITensor)` directly
    x̄ = Array(unthunk(ȳ), a...)
    ā = broadcast_notangent(a)
    return (NoTangent(), x̄, ā...)
  end
  return y, ITensor_pullback
end

This also removes the ::Index from the a argument of the rrule that is presently there.

Dispatch on type of a in rrule

function ChainRulesCore.rrule(::typeof(ITensor), x::Array{<:Number}, a::Index...)
  y = ITensor(x, a...)
  function ITensor_pullback(ȳ)
    # TODO: define `Array(::ITensor)` directly
    x̄ = Array(unthunk(ȳ), a...)
    ā = broadcast_notangent(a)
    return (NoTangent(), x̄, ā...)
  end
  return y, ITensor_pullback
end

function ChainRulesCore.rrule(
  ::typeof(ITensor), x::Array{<:Number}, a::Vector{T}
) where {T<:Index}
  y = ITensor(x, a)
  function ITensor_pullback(ȳ)
    # TODO: define `Array(::ITensor)` directly
    x̄ = Array(unthunk(ȳ), a...)
    ā = broadcast_notangent(a)
    return (NoTangent(), x̄, ā...)
  end
  return y, ITensor_pullback
end

Shall I include a fix for error 2 in the PR? If so, which of these is preferable? Option 1 seems more elegant unless there is good reason to keep the type notation on a and not add this method to Array. First time messing with chain rules definitions so apologies if theres a clear reason not to do this!

Option 2 seems to require splatting a_bar in the vector argument rrule, which doesn't seem right, but there is an unclear error about ZeroTangents if not.

Note both contain the fix for the transpose issue, which seems like it only required replacing the index argument in the Array call of the pullback with a.

Template modficiation sounds good. While I didn't make much use of it I can imagine cases where Expected Behaviour would be more important.

mtfishman commented 2 years ago

I think Option 1 is best, there is no particular reason I didn't define the Array constructor to accept more general collections of indices, I think it was just an oversight and that code was written a while ago when I hadn't worked out the best code patterns for writing constructors for generic collections of Index. Also it is better to have just one generic rrule.

Take a look at constructors like randomITensor:

https://github.com/ITensor/ITensors.jl/blob/921e2116beb74d21028d2dde05a26170d6d5f622/src/itensor.jl#L1449-L1457

for an example of how I've handled accepting more generic collections.

Indeed, I think you're correct that using a in the pullback should fix the issue in the simplest way. Please go ahead and make a PR fixing both of those issues, thanks!

DomCRose commented 2 years ago

Great, will do!

In regards to the new Array method(s?), I've also been looking at what is currently implemented: https://github.com/ITensor/ITensors.jl/blob/921e2116beb74d21028d2dde05a26170d6d5f622/src/itensor.jl#L766-L786 I'm still getting used to the some of the type manipulations in Julia code, but am I correct in thinking this hierarchy of functions is to improve type stability? E.g. calling just Array(::ITensor, ::Indices) is not type stable as the element type is not inferrable from the input, so Array{ElT} exists so that the user could write type inferrable code if its needed to compile something for speed?

Given this, an issue with a vector input is that its length is not part of the type, so the output type can't be inferred unless the vectors length is both provided by hand and known at compile time.

Currently I'm considering something like

function Array(x, a::Vector{T}) where {T<:Index}
  return Array{eltype(T),length(a)}(x, a...)
end

function Array{ElT,N}(x, a::Vector{T}) where {T<:Index} where {ElT,N}
  return Array{ElT,N}(x, a...)
end

Where direct call of the second function is type stable and errors if N != length(a).

I'll have a think about how the vector and vararg versions could be combined using Indices as you did in in randomITensor, and open a PR after the weekend.

mtfishman commented 2 years ago

Yes that's correct, the main reason for those different sets of functions is for the purpose of improving type stability (it allows people to specify the type if they want to, since as you say the types are not inferable because of the design of the ITensor type).

I would recommend something like this:

function Array{ElT,N}(T::ITensor, is::Indices) where {ElT,N}
  ndims(T) != N && throw(
    DimensionMismatch(
      "cannot convert an $(ndims(T)) dimensional ITensor to an $N-dimensional Array."
    ),
  )
  TT = tensor(permute(T, is))
  return Array{ElT,N}(TT)::Array{ElT,N}
end

function Array{ElT,N}(T::ITensor, is...) where {ElT,N}
  return Array{ElT,N}(T, indices(is...))
end

function Array{ElT}(T::ITensor, is::Indices) where {ElT}
  return Array{ElT,length(is)}(T, is)
end

function Array{ElT}(T::ITensor, is...) where {ElT}
  return Array{ElT}(T, indices(is...))
end

function Array(T::ITensor, is...)
  return Array{eltype(T)}(T, is...)
end

function Array{<:Any,N}(T::ITensor, is...) where {N}
  return Array{eltype(T),N}(T, is...)
end

The indices function helps take care a variety of inputs like Vararg, Tuple, and Vector, and Julia should be able to infer the types when possible with length (it doesn't always work, but we can add tests for type stability).

DomCRose commented 2 years ago

I had a bit more of a look into bug 4 recently, and the issue seems to stem from contraction optimization. When activated this causes zygote to miss the chain rule and instead tries to differentiate the contraction code itself. This results in some extremely long traces for whatever reason, which is presumably what causes the stack overflow.

In the long run I guess the aim is to replace the contract rrule with something like the Evenbly and Pfeifer paper, but perhaps a good short term fix here would be to make zygote see the chain rule with contraction optimization on (I haven't figured out why this is happening and hence how to fix it currently).

In looking into this I noticed another bug still present in the ITensor rrule that the itensor rrule had fixed in the past: it doesn't handle ITensor construction where the array shape is changed. I'll submit a pull request with a fix for this in a bit!

mtfishman commented 2 years ago

Yes that's right, currently rrule is not written to account for the case of contract/* when contraction sequence optimization is enabled (i.e. any nontrivial contraction sequence). That's exactly right that the plan was to implement the rrule based on https://arxiv.org/abs/1310.8023. Indeed, the current implementation is pretty naive, it is designed for the case when the sequence is following left associativity, i.e. the default sequence [[[1, 2], 3], 4], ....

I'm not sure why it doesn't just call the current rrule when contraction sequence optimization is turned on, that is strange. I hadn't tested it, I just figured it would have called the current rrule but be slower in general since it is not using the best contraction sequence for the gradient terms. In principle we could have it run contraction sequence optimization for each gradient term/tensor in the network, as a stopgap measure, once we get it to actually call the rrule. Feel free to implement https://arxiv.org/abs/1310.8023 if you are motivated to do so as well!

Thanks for looking into the bugs in the ITensor rrule, good to have those basic rules working and solid, since they are very important for getting more ITensor code to be differentiable.

Also, for bug 1 that you point out in the first post, it would be good to fix that bug and hopefully it is a small fix. Additionally, I think we can entirely avoid the inv_op business and make the rule for all type of index replacements more general if we write the rules in terms of replaceinds (i.e., any tagging and priming operation can be seen in terms of a call like replaceinds(x, is1 => is2), so the rrule could just be written as something like replaceinds(ȳ, is2 => is1)).

DomCRose commented 2 years ago

I'll have a closer look at that paper, it would be good practice to implement it. I would say though, this work is a little exploratory/side projecty at the minute relative to my primary research so I can only commit time to it sporadically!

I notice that paper specifies the approach is specific to closed tensor networks, although I haven't figured out why yet. Are you guys aware of any generalization to open tensor networks? I can see that being important for e.g. ML applications.

Edit: I recall you guys had some work integrating with a package called AutoHOOT at some point too? Just curious whats happening/happened with that. Are you aiming for a pure julia implementation instead?

mtfishman commented 2 years ago

No pressure at all to work on it, let me know if you get started so we can coordinate efforts (I don't anticipate having time to work on it in the next few weeks).

I think the generalization to open networks is relatively simple, since the backwards pass derivative gets contracted with the forward pass network in the pullback to make a closed network (up to removing the tensor that is being differentiated). I haven't seen that explicitly laid out though, the information on this stuff is very scattered, and the papers talking about tensor network AD seem to skip over simple descriptions of basic rules like that.

In https://arxiv.org/abs/1310.8023, they limit to closed networks since their application area was more narrowly focused on creating tensor network environments for optimizing cost functions like the energy (with the ultimate goal of improving the optimization of MERA networks), as opposed to the more general goal of deriving automatic differentiation rules. In fact, I don't think they ever mention gradient optimization in the paper. Additionally, my understanding is that they are re-deriving a more general fact about the overhead of reverse mode automatic differentiation. Basically I think it was ahead of it's time, when applying more general optimization techniques like gradient optimization and automatic differentation in the context of tensor networks wasn't really on people's minds.

Indeed, @LinjianMa, the main developer of the AutoHOOT Python library, wrote a Julia wrapper: https://github.com/LinjianMa/AutoHOOT.jl which he is using to explore more advanced network-level AD in the package https://github.com/ITensor/ITensorNetworkAD.jl. However, that project has more ambitious and exploratory plans than the current plans for ITensors.jl differentiation, where we just want to ensure that all ITensor operations are differentiable, include network contractions, without too many dependencies like AutoHOOT/Python for the time being. So I think it is still worth having a minimal rule in ITensors.jl based on https://arxiv.org/abs/1310.8023 for contracting networks even though it could be handled by AutoHOOT.

DomCRose commented 2 years ago

Great, thanks for the info! I see what you mean about the extension to open networks: the gradient of the rest of the forward pass after the open TN gets multiplied by the jacobian of the open TN, which is then simply the environment in the closed network (open TN contracted with gradient) of each input tensor.

So I've spent some time reading https://arxiv.org/abs/1310.8023 and thinking about differentiating contractions. Theres seems to be a couple of potential approaches.

Evenbly and Pfeifer

Following that paper, the rough approach sounds like it would be: Firstly, find the optimal contraction sequence for the environment of one input tensor. This may not be the optimal sequence found for the whole network, as the final contraction may not involve an input tensor.

Viewing this sequence as a graph, each binary contraction can be viewed as a vertex with three edges, two going in represting the tensors being contracted, and one going out representing the contracted tensor going on to its next contraction (i.e. a binary tree). The environment of every other input tensor can be efficiently constructed by approprate swaps of the outgoing edges with the incoming edges. In the resulting family of related graphs for each environment, each vertex will appear with the outgoing edge in all 3 possible orientations.

Each orientation of a vertex would have a certain "depth" in the contraction trees it is present in, i.e. how many contractions have to occur earlier in the tree before the binary contraction represented by the vertex can occur. The shallowest correspond to contractions involving the input tensors, while the deepest correspond to final contractions to construct an environment.

To construct the environments, you therefore order these vertices by their depths, then contract them shallowest first, until the deepest contractions occur leaving the set of environments.

Reverse diff on the optimal contraction sequence

The approach from the Evenbly and Pfeifer paper is shown to have a cost of slightly less then 3 times the cost of contracting the full network. However, I've wondered before what the effect of simply applying reverse diff to a contraction sequence would do. Having thought a bit more, I think it may produce essentially the same result?

Roughly speaking, lets say on the forward pass a given vertex in the contraction tree involves two tensors with dims (a,c) and (b,c), where c is the dim of the shared legs and b, a the dims of the remaining legs. Then the binary contraction of this index on the forward pass costs a*b*c, and results in a tensor with dims (a,b)

Suppose we then apply reverse diff directly to this sequence of binary contraction calls. At any vertex the pullback will consist of contracting an (a,b) dim cotangent with each of the two input tensors, giving two contractions both also costing a*b*c.

Since the reverse diff will reuse both the tensors resulting from the forward pass, and the cotangents during the backward pass the cost of this reverse diff should be almost exactly 2 times the cost of the forward contraction, resulting in a total of 3 times the cost for the gradient compared to the forward pass. Thus, this seems to achieve the same bound as the paper, while being somewhat more generic, i.e. no need to find the optimal contraction tree for an environment, you can use the optimal contraction tree for the full network. In fact, this cost should in principle apply to any contraction sequence, optimal or not?

(Note: I haven't read the AutoHOOT paper and am pretty new to TNs in general, so perhaps something like this has already been noted elsewere.)

I'm not 100% sure but I would go so far as to argue that in the case where the contraction tree is one in which the final contraction is between one of the input tensors and its environment, this procedure will do the exact same thing as the above approach from the paper.

Discussion / questions

Given this, I'm wondering a few things for doing this:

  1. Have I missed anything obvious / made any glaring mistakes, either in the approach from the paper or the argument motivating using reverse diff?
  2. Given this theoretical perspective of applying reverse diff, is there a technical reason why we wouldn't want to just define the binary contraction pullback (already done), tag the appropriate functions non-diff, and let the AD figure it out for the given sequence? While I know zygote for example can handle most things, the recursive and tensor network dependent nature of the optimal contraction might make it difficult to construct/compile the derivative of the current implementation. However, this would be easy to implement, and might produce reasonably efficient higher order derivatives quite easily? Given the current state of AD this may also not work well with mutation, if there were some sort of cached versions of the TN contraction.
  3. Assuming the reverse diff discussion is correct, but we don't want to pass the work on to AD, this reverse differentiation version could be implemented manually. I'm unsure how this would gel with higher derivatives, but thats secondary to just getting a good first order derivative working.
  4. Even if the reverse diff perspective isn't implemented for a generic contraction sequence, it could provide a nicely structured approach to doing the required contractions for the Evenbly and Pfeifer approach.
  5. I haven't though too much about how to actually implement this. I took a brief look at the optimal contraction code. Could any of those utilities be reused for this?

Interested to hear what you guys think!

DomCRose commented 2 years ago

A small update on this. Based on the above I decided to give relying on reverse diff a go, since it was quite simple to implement. In short, it works, but compile times are fairly long (at least for the first gradient, though not much longer than currently) and the results are some way off optimal. Details below, apologies for the second long post in a short period of time!

Approach

To do this I made three small changes.

Results

See below for two examples, with comments showing timings / results of certain lines.

using ITensors
using Zygote
using BenchmarkTools

i, j, k, l, m, n, u, v = Index.([10, 50, 5, 2, 10, 12, 7, 50])

A = randomITensor(i, j, k)
B = randomITensor(i, l, m)
C = randomITensor(j, u, n)
D = randomITensor(u, v)
E = randomITensor(k, v)
F = randomITensor(l, m, n)

# Left associative
func(A, B, C, D, E, F) = (A*B*C*D*E*F)[]
@time func(A, B, C, D, E, F) # 6.2s, 99.9% compilation
@btime func($A, $B, $C, $D, $E, $F) # 173.3μs, 132 allocations (1.1MiB)
@time gradient(func, A, B, C, D, E, F); # 18.7s
@btime gradient($func, $A, $B, $C, $D, $E, $F); # 624μs, 766 allocations (2.45MiB)

#Test left associative contraction derivative vs finite diff
include("chainrulestestutils.jl") # A copy of the ITensor chainrules test folder file
using Zygote: ZygoteRuleConfig
args = (A, B, C, D, E, F);
test_rrule(
    ZygoteRuleConfig(),
    func,
    args...;
    rrule_f = rrule_via_ad,
    check_inferred = false
) # Passes all tests

# Optimized contraction (pre calculated)
seq = ITensors.optimal_contraction_sequence([A, B, C, D, E, F])
func2(A, B, C, D, E, F) = contract([A, B, C, D, E, F]; sequence = seq)[]
@time func2(A, B, C, D, E, F) # 1.78s, 99.9% compilation
@btime func2($A, $B, $C, $D, $E, $F) # 21.7μs, 118 allocations (92.58KiB)
@time gradient(func2, A, B, C, D, E, F); # 2.1s
@btime gradient($func2, $A, $B, $C, $D, $E, $F); # 234.9μs, 1526 allocations (357.72KiB)

# Check equivalent
prod(gradient(func, A, B, C, D, E, F) .≈ gradient(func2, A, B, C, D, E, F)) # true

# Optimized contraction (not pre calculated)
ITensors.enable_contraction_sequence_optimization()
@time func(A, B, C, D, E, F) # 0.06s, 99.6% compilation
@btime func($A, $B, $C, $D, $E, $F) # 33.4μs, 338 allocations (116KiB)
@time gradient(func, A, B, C, D, E, F); # 1.6s, 99.7% compilation
@btime gradient($func, $A, $B, $C, $D, $E, $F); # 269μs, 1692 allocations (384KiB)

# Check equivalent
prod(gradient(func, A, B, C, D, E, F) .≈ gradient(func2, A, B, C, D, E, F)) # true

#Test sequence contraction derivative vs finite diff
args = (A, B, C, D, E, F);
test_rrule(
    ZygoteRuleConfig(),
    func,
    args...;
    rrule_f = rrule_via_ad,
    check_inferred = false
) # Passes all tests

# Simpler case
A = randomITensor(i, j, k)
B = randomITensor(i, k, m)
C = randomITensor(j, m)

g(A, B, C) = (A*B*C)[]
@time g(A, B, C) # 0.12s, 99.9% compilation
@btime g($A, $B, $C) # 8.2μs, 53 allocations (28.7KiB)
@time gradient(g, A, B, C); # 1.44s, 99.9% compilation
@btime gradient($g, $A, $B, $C); # 118.4μs, 657 allocations (110KiB)

#Test derivative vs finite diff
args = (A, B, C);
test_rrule(
    ZygoteRuleConfig(),
    g,
    args...;
    rrule_f = rrule_via_ad,
    check_inferred = false
) # Passes all tests

For comparison, on the old rrules:

# Left associative
@time gradient(func, A, B, C, D, E, F); # 13.4s
@btime gradient($func, $A, $B, $C, $D, $E, $F); # 1.4ms, 910 allocations (6.54MiB)

While anything involving sequences (manually input or default = optimal) fails.

Overall, there may certainly be some overhead, assuming the argument for why this should be 3 times the cost of the forward pass is correct. While closer to 3 times in the left associative case (at least for a small number of tensors) it seems more like 10-15 times for the optimal contraction case. Although still faster than left associative in this case, since the contraction sequence is sufficiently better.

Compilation on the first gradient is also rather large, though not so bad on later ones. This might have more to do with Zygote spinning up and caching reusable stuff for future gradients, I don't really know how it works internally.

I was surprised to see it appears to perform better than the current rrule even with left associative contraction, albeit with slightly longer compilations times. I guess the default foldl rrule is quite effective? [Edit: on second thought, I suppose this might be because it makes use of the forward pass in the reverse pass, where the current rrule doesn't?]

So it could be worth going forward with writing an rrule which explicitly does either the reverse pass of a generic contraction sequence, or the Evenbly and Pfeifer paper, if we can improve on the compilation time / runtimes from this approach. In particular, I'm unsure if the 3-5x increased runtime vs "optimal" reverse diff here could grow on larger networks with contraction sequences, since it will have to differentiate through a deeper recursion.

P.S. I've just checked and tests pass with these changes. Could do with a larger benchmark than what I tried here for a better speed comparison.

mtfishman commented 2 years ago

I'll try to find some time this week to read through this but at first glance it looks really great! Indeed it would be better to just get Zygote to work directly on contract instead of hard-coding the Evenbly-Pfeifer paper, since in principle a properly written general purpose reverse mode AD should be using something equivalent to that algorithm internally.

@LinjianMa this is probably related to some of the things you've been working on, please feel free to jump in and comment on the different strategies here.

mtfishman commented 2 years ago

@DomCRose thanks for looking into this in such detail and for the careful timing comparisons. Your summary sounds correct to me, and that is my current understanding of the relationship between the Evenbly-Pfeifer paper and reverse mode AD.

I think for now, the best approach is the one you took of making the current recursive contract function differentiable and just letting Zygote take the derivatives for us as long as it has reasonable performance, since I would prefer to not have to write rrules if it can be avoided. I just wrote the current multi-ITensor contraction rrule quickly to get things working initially, I am not surprised that it is not optimal.

The timings you give don't seem so concerning since there is a lot of dynamic code going on in Zygote and ITensor contractions, and the overall times are short even if the relative times are far from optimal. I'm curious how the timings look if you make the tensor dimensions larger, hopefully they would get closer to optimal.

I'm also not too concerned about the compilation times right now (though we should start worrying about them soon), I've found that the Zygote compilation times with ITensor are pretty long anyway so I think if people are using it heavily with ITensor they should use tools like PackageCompiler.jl. Some of this could be fixed on the ITensor side as well, I bet a lot of the compilation time is the NDTensors contract function which unfortunately has long compilation times right now. I've tried fixing it a few times with precompilation but have had limited success.

A PR with the changes you made to making the recursive contract function differentiable and removing the current multi-ITensor rrule would be very much appreciated!

DomCRose commented 2 years ago

Great! I should have some time to do that in the next few days, will only need to rebase the branch I was testing this in.

I'll do some more timing tests with larger bond dimensions, as you say, the overhead should go down in that case. I imagine much of the overhead comes from allocations caused by Zygote. Some of this will be unavoidable due to there being no mechanism for providing caches to the gradient function. However, I'm not sure when Zygote allocates, so the current code for contracting according to a contraction sequence may cause it to allocate more than that minimum amount. I suspect something like this could be the case, since the overhead for the foldl contraction seems much lower (though from limited testing). But reducing that overhead could be a large job, either involving a proper chain rule or some alternative approach to the contraction. Regardless, this is indeed not too far off the optimal without any allocations even in the worst cases (5 or 6 times optimal is essentially the worst I've see), so seems reasonable for the time being.

Indeed, there are short term ways to deal with compilation, although even 20 seconds isn't dreadful if the simulations going to be running more than a few minutes.

DomCRose commented 2 years ago

So I'm having difficulties getting this done at the minute. Currently I'm finding that now I've rebased to main, I'm seeing the following error:

A = rand(2, 2)
i = Index(2)
j = Index(2)
ITensor(A, i, j) # ERROR: MethodError: no method matching NDTensors.Dense(::Matrix{Float64})

Which understandably is breaking much of the testing suite, including some AD tests. This is happening when I just use the main branch too, rather than my branch adding the contraction AD.

Unsure how to fix this. Could it be related to #848? I notice that passed tests, so I'm not sure why it would be.

This PC is currently on Julia 1.6, but again, that doesn't seem like it should be an issue.

[Edit:] So for some reason my local (precompiled only?) version has the #848 changes to itensor.jl (i.e. removing the vec call) but not the changes to dense.jl. I've tried deleting the NDTensors folder in julia/compiled but its still precompiling a version with the old code it seems.

To clarify, the NDTensors code in the dev folder is up to date, but the code its actually calling when I run anything is the pre-#848 version. I've never seen this type of behaviour so not really sure what the solution is.

[Edit 2:] So a temporary fix seems to have been to remove the compat line and Pkg.dev NDTensors to force it to use the local version in dev/ITensors. Once its using the correct NDTensors code, all tests are passing apart from two in the chainrules tests, which are returning errors:

f = function (x)
  b = itensor([0, 0, 1, 1], i, j)
  k = itensor([0, 1, 0, 0], i, j)
  T = itensor([0 x x^2 1; 0 0 sin(x) 0; 0 cos(x) 0 exp(x); x 0 0 0], i, j, i', j')
  return x * real((b' * T * k)[])
end
args = (0.3,)
test_rrule(ZygoteRuleConfig(), f, args...; rrule_f=rrule_via_ad, check_inferred=false)
#triggers
MethodError: no method matching zero(::var"#30#70"{Index{Int64}, Index{Int64}})
f = function f(x)
  v = itensor([exp(-3.2x), cos(2x^2)], j)
  T = itensor([x^2 sin(x); x^2 exp(-2x)], j', dag(j))
  return real((dag(v') * T * v)[])
end
args = (2.8,)
test_rrule(ZygoteRuleConfig(), f, args...; rrule_f=rrule_via_ad, check_inferred=false)
args = (2.8 + 3.1im,)
test_rrule(ZygoteRuleConfig(), f, args...; rrule_f=rrule_via_ad, check_inferred=false) # this test
#triggers
MethodError: no method matching zero(::var"#f#101"{Index{Int64}})

Both occuring with ChainRulesTestUtils.test_approx. Unsure why this has starting happening, could be related to the other issues I've had getting things synced up. This is particularly confusing since the first rule appears in exactly the same way twice in the code (here in main repo), yet the first doesn't error.

Could the NDTensors issue be because ITensors is tagged to a particular NDTensors release (0.1.35) in the project compat, which I'm guessing may point to a version in the general registry that doesn't contain the recent NDTensors changes? Unsure how this sub-packaging works.

mtfishman commented 2 years ago

Hi @DomCRose, thanks for looking into this. I was going to suggest reverting any changes to compat and using ] dev NDTensors, I need to add those as tips in the Developing ITensors section of the docs.

I would suggest making a PR with whatever you have and seeing if the tests pass. Perhaps there is still something off with your local setup and also it would help for me to see what changes you've made.

mtfishman commented 2 years ago

Are all of the initial issues addressed now? If not, could you edit the first post to show which ones are addressed vs. are still open issues?

DomCRose commented 2 years ago

As far as I can tell error 1 is still an issue, which was discussed above:

Also, for bug 1 that you point out in the first post, it would be good to fix that bug and hopefully it is a small fix. Additionally, I think we can entirely avoid the inv_op business and make the rule for all type of index replacements more general if we write the rules in terms of replaceinds (i.e., any tagging and priming operation can be seen in terms of a call like replaceinds(x, is1 => is2), so the rrule could just be written as something like replaceinds(ȳ, is2 => is1)).

This at least seems like more of a bookkeeping exercise than the contracion diff. I'll have a look into it over the next week or two so we can finally close this issue.

I've edited the first post so that its clear the other three issues have been solved, and by which pull requests.

mtfishman commented 2 years ago

Great, thanks for the update! Indeed, that looks like a relatively easy issue to fix.