JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.03k stars 5.43k forks source link

Re-associating matrix-chain multiplication yields unpredictable results #52392

Open mikmoore opened 7 months ago

mikmoore commented 7 months ago

Version 1.9.4.

*(A, B::AbstractMatrix, C)
  A * B * C * D

  Chained multiplication of 3 or 4 matrices is done in the most efficient sequence, based on the sizes of the
  arrays. That is, the number of scalar multiplications needed for (A * B) * C (with 3 dense matrices) is
  compared to that for A * (B * C) to choose which of these to execute.

  If the last factor is a vector, or the first a transposed vector, then it is efficient to deal with these
  first. In particular x' * B * y means (x' * B) * y for an ordinary column-major B::Matrix. Unlike dot(x, B,
  y), this allocates an intermediate array.

  If the first or last factor is a number, this will be fused with the matrix multiplication, using 5-arg mul!.

  See also muladd, dot.

  │ Julia 1.7
  │
  │  These optimisations require at least Julia 1.7.

Obviously, this transformation slightly changes results for floating point numbers due to their mild non-associativity. However, it is also applied in contexts with mathematics that are decisively non-associative. For example:

julia> using Octonions

julia> x = Octonion(1,2,3,4,5,6,7,8); y = Octonion(1,-2,3,4,5,6,7,8); z = Octonion(1,2,-3,4,5,6,7,8);

julia> [x;;] * [y;;] * [z;;] # left-associative evaluation
1×1 Matrix{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -860, -1128, -1036, -1712)

julia> [x;;] * [y;;] * [z;;] * [1;] # trailing vector causes right-associativity
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -1148, -888, -1420, -1376)

We like to compose packages in Julia. Arbitrary re-assocation is incompatible with this. Since "standard" associativity is undefined, it becomes necessary for a "correct" package to explicitly force every association with parentheses. N-ary evaluation cannot be correctly applied without tight control of the context that ensures re-association is mathematically valid. The presence of re-association in any insufficiently-controlled context (like the above matrix-chain specialization) will lead to correctness issues across the ecosystem.

Spiritually related to #52333.

thofma commented 7 months ago

Why is this a correctness bug? Writing A * B * C is ambiguous if the operation is not associative, so returning either is fine. Doesn't this fall under the category of undefined behaviour?

antoine-levitt commented 7 months ago

This method could be specialized to Real and Complex. It would lose some extensibility for user-defined associative types that are not real or complex, but are there any?

thofma commented 7 months ago

Yes, quaternions for example.

adienes commented 7 months ago

this is not necessarily what Julia should nor, nor is it necessarily a reasonable assumption

but purely as a matter of practicality, I suspect most users would expect that in the absence of parens, a * b * c will evaluate left-to-right as (a * b) * c

maybe an isassociative trait is warranted?

mikmoore commented 7 months ago

Writing A * B * C is ambiguous if the operation is not associative

Is this really behavior we want to have undefined? If you go by https://docs.julialang.org/en/v1/manual/mathematical-operations/#Operator-Precedence-and-Associativity then * should be left-associative "by default" but is actually non-associative. What use is a default if it isn't even clear when it applies? If you go by

help?> *
search: *

<lines omitted>

  *(A, B::AbstractMatrix, C)
  A * B * C * D

  Chained multiplication of 3 or 4 matrices is done in the most efficient sequence, based on the sizes of the
  arrays. That is, the number of scalar multiplications needed for (A * B) * C (with 3 dense matrices) is
  compared to that for A * (B * C) to choose which of these to execute.

  If the last factor is a vector, or the first a transposed vector, then it is efficient to deal with these
  first. In particular x' * B * y means (x' * B) * y for an ordinary column-major B::Matrix. Unlike dot(x, B,
  y), this allocates an intermediate array.

  If the first or last factor is a number, this will be fused with the matrix multiplication, using 5-arg mul!.

  See also muladd, dot.

  │ Julia 1.7
  │
  │  These optimisations require at least Julia 1.7.

(note that this docstring does not appear in the online Julia docs, which should probably be resolved) then the association is explicitly undefined.

I'm here to argue that an order that is undefined and results in mathematical errors (i.e., more than just roundoff) is something we should consider not having because it makes generic code very difficult to write. You cannot write any syntactically-chainable * (on purpose or accident) unless you absolutely know that re-association is fine for any input type that can reach that code. Most users will miss this out of ignorance but even the best will sometimes miss it out of forgetfulness. Due to the rarity of non-associative types, these are likely to be missed in tests.

Personally, I always use parentheses to express my chained matrix multiplication because I didn't even know about these specializations until this week (and they've only existed since v1.7 and don't appear in the online docs). I never found it particularly onerous and there are packages that handle this anyway (and much more generally) by introducing dedicated functions for re-associative * on matrices.

Another fun consequence of our special-casing of 3-4 argument chained matmul is this:

julia> [x;;] * [y;;] * [z;;] * [1;] # right-associative
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -1148, -888, -1420, -1376)

julia> [1;;] * [x;;] * [y;;] * [z;;] * [1;] # chain another matrix and it reverts to left-associative
1-element Vector{Octonion{Int64}}:
 Octonion{Int64}(-652, -1064, 612, -736, -860, -1128, -1036, -1712)

I find this behavior very difficult to defend.

mcabbott commented 7 months ago

Maybe the first question is whether * is intended to be the multiplication for objects with a non-associative binary operation. Has this been discussed anywhere? (I don't mean the docs telling you about a fallback vararg method.) I suppose this is another kind of interface question.

The assumption that LinearAlgebra deals with associative multiplication also exists in matrix powers, and in dot, and perhaps elsewhere:

using Octonions
begin
  a = Octonion(rand(1:99, 8)...)
  M = [Octonion(rand(1:99, 8)...) for _ in 1:2, _ in 1:2]
  v = [Octonion(rand(1:99, 8)...) for _ in 1:2]
  w = [Octonion(rand(1:99, 8)...) for _ in 1:2]
end;

@assert M^3 == M*(M*M) != (M*M)*M            # power_by_squaring, since before Julia 1.0
@assert M^4 == (M^2) * (M^2) != ((M*M)*M)*M

@assert dot(v,M,w) == (v'*M)*w == v'*M*w != dot(v, M*w)  # disagrees with docstring, Julia 1.4
@assert dot(v,M',w) == v'*(M'*w) == dot(v, M'*w)         # calls adjoint(dot(w, M.parent, v))

[Edit, I misunderstood a comment above]. Quaternions are associative (but non-commutative). The methods for fused * should all be restricted to Union{Real,Complex} when they commute arguments, e.g. @which a*M*v is the fallback not the fused one. (Arrays of matrices are the obvious way to get non-commutative elements.)

What examples are there besides Octonions? Are there some which are useful in the wild? Useful to feed into generic code? (We have a vector cross product × which is non-associative. Lie algebras are useful & non-associative, but surely nobody would write their binary operaion as a*b.)