Jutho / TensorOperations.jl

Julia package for tensor contractions and related operations
https://jutho.github.io/TensorOperations.jl/stable/
Other
481 stars 57 forks source link

Incorrect result when using input tensor to store result of contraction involving sum #192

Open leburgel opened 1 month ago

leburgel commented 1 month ago

I encountered what I think is a pretty sneaky bug when assigning the result of a tensor contraction to an existing tensor used in the contraction:

using LinearAlgebra
using TensorOperations

A = randn(ComplexF64, 8, 4, 8)
L = randn(ComplexF64, 8, 8)
R = randn(ComplexF64, 8, 8)
e = randn(ComplexF64)

@tensor B[-1, -2, -3] := L[-1, 1] * A[1, -2, 2] * R[2, -3] + e * A[-1, -2, -3]

@tensor A[-1, -2, -3] = L[-1, 1] * A[1, -2, 2] * R[2, -3] + e * A[-1, -2, -3]

@show norm(B - A) # gives different result

When removing the sum we do get the same result in both cases.

I assume this pattern creates a similar issue as something like

x = randn(ComplexF64, 10)
A = randn(ComplexF64, 10, 10)

@tensor x[-1] = A[-1, 1] * x[1] # throws `ERROR: ArgumentError: output tensor must not be aliased with input tensor`

However, in the first example this is never caught, probably because there is a distinct temporary created in the actual tensor contraction which circumvents the error check.

I don't know if this is a bug that could be fixed or just a pathological pattern that shouldn't be used. But in the latter case this should definitely result in some kind of explicit error being thrown.

Jutho commented 1 month ago

I should think about it better, but I think it would be very hard to detect all patterns where output and input are conflicting. Even Base or LinearAlgebra don't manage this for simple functions:

julia> A = randn(4,4);

julia> B = randn(4,4);

julia> A * B
4×4 Matrix{Float64}:
  0.757171   2.97652   -0.309612  -0.37975
 -4.59932    2.32848   -1.15765   -1.59679
  1.06262   -0.313278   0.301452   3.1791
 -2.7318    -1.15279   -0.807899  -1.06701

julia> mul!(A, A, B)
ERROR: ArgumentError: output matrix must not be aliased with input matrix
Stacktrace:
 [1] gemm_wrapper!(C::Matrix{…}, tA::Char, tB::Char, A::Matrix{…}, B::Matrix{…}, _add::LinearAlgebra.MulAddMul{…})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.11.0+0.x64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:633
 [2] generic_matmatmul!
   @ ~/.julia/juliaup/julia-1.11.0+0.x64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:381 [inlined]
 [3] _mul!
   @ ~/.julia/juliaup/julia-1.11.0+0.x64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:287 [inlined]
 [4] mul!
   @ ~/.julia/juliaup/julia-1.11.0+0.x64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:285 [inlined]
 [5] mul!(C::Matrix{Float64}, A::Matrix{Float64}, B::Matrix{Float64})
   @ LinearAlgebra ~/.julia/juliaup/julia-1.11.0+0.x64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/matmul.jl:253
 [6] top-level scope
   @ REPL[10]:1
Some type information was truncated. Use `show(err)` to see complete types.

julia> mul!(view(A, :, :), A, B)
4×4 view(::Matrix{Float64}, :, :) with eltype Float64:
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
leburgel commented 1 month ago

I see, I hadn't considered things like this. I assumed the issue was quite specific here given that it's the sum that messes it up and it closely mirrors a case that is explicitly disallowed. I can just be more careful from now on, but maybe a warning somewhere in the documentation that using = to assign to an object that is used in the right hand side expression itself is a little dangerous would be good?

Probably explicitly disallowing everything that is inherently conflicting is not possible, but raising awareness that things like this can happen might be good. This one was an absolute pain to debug :)

Jutho commented 1 month ago

Oh yes, I am absolutely for more warnings about the fact that reusing input arrays in the output when using = instead of := might be problematic and requires absolute care and some understanding of what is happening internally. If you see a good place where to add this so people would spot it, feel free to suggest so.

lkdvos commented 1 month ago

On the other hand, I would like the option to access the in-place versions with scalars, as this is currently never possible, we never generate non-zero beta arguments.

If we could detect things like A = beta * A + ... this should be possible (and would make sure that your example can be achieved with one fewer tensoradd call)

Jutho commented 1 month ago

I agree it would be good if we could detect this particular pattern. That might be possible within instantiate_linearcombination.

leburgel commented 1 month ago

For now I would think a warning block right below this paragraph would be good. At least that's where I went immediately when trying to figure out what was going on.