in 1.10+, matrix decomposition components aren't always AbstractMatrix and don't behave like them

aplavin commented 1 month ago

Tried upgrading Julia version from 1.9 in one of my envs, and noticed this:

julia> using LinearAlgebra

julia> f(a::AbstractMatrix) = sum(a)  # MWE

julia> f(qr(rand(5,5)).Q)
# 1.9:
-1.4572588703587335
# 1.10+:
ERROR: MethodError: no method matching f(::LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}})

qr(A).Q not being an AbstractMatrix is very surprising even in isolation, but especially so given the previous behavior in Julia. How come it's allowed in Julia 1.x? Seems like the clear and unambiguous definition of breaking change... From Julia docs:

However, upgrading to the next Stable release will always be possible as each release of Julia v1.x will continue to run code written for earlier versions.

Introduced in https://github.com/JuliaLang/julia/pull/46196 – deliberately, it wasn't just an oversight.

giordano commented 1 month ago

https://docs.julialang.org/en/v1.9/stdlib/LinearAlgebra/#LinearAlgebra.qr

Q matrix can be converted into a regular matrix with Matrix.

aplavin commented 1 month ago

Could you elaborate a bit? Of course, I need Matrix() to convert an AbstractMatrix to a regular Matrix. But how is that related here?

giordano commented 1 month ago

Could you elaborate a bit?

The documentation of Julia v1.9 is saying that you if you want to convert Q to a regular matrix you should use Matrix(Q).

But how is that related here?

It's related because that was the documented way to go already in Julia v1.9.

aplavin commented 1 month ago

convert Q to a regular matrix you should use Matrix(Q)

Note that Q was never a Matrix, and I don't expect it. But I do expect to be able to pass Q to functions that take AbstractMatrices.

giordano commented 1 month ago

You may be a victim of Hyrum's Law, where you relied on pre-existing unintended and undocumented behaviour.

aplavin commented 1 month ago

you relied on pre-existing unintended and undocumented behaviour

What exactly is not documented? That "returns a matrix" means "AbstractMatrix"? This is very common in Julia docs, both in terms of return values and argument types. Just clicked on the very first function on the linear algebra page:

tr(M) Matrix trace. Sums the diagonal elements of M.

Naturally, tr() is defined for AbstractMatrix. The same is often for "array" in docs = "AbstractArray", both in Julia and in packages.

This is a documented and expected behavior, the linked PR even broke several existing packages. Note that code relying on qr().Q being an AbstractMatrix is most likely to appear in the enduser code, not in packages: one reasonably expects to take a function from some package that works on "matrices" (AbstractMatrix) and apply it to a QR decomposition component. Actually, LinearAlgebra.tr(qr(A).Q) doesn't work – no need to even go to packages.

If that's not a breaking change, very few things are :) If such changes are expected in the future as well, should we clarify the stability docs then?

upgrading to the next Stable release will always be possible as each release of Julia v1.x will continue to run code written for earlier versions. * : if the code is in a registered package *: if the relevant codepath is covered by tests : if you make sure your tests always pass on nightly ****: if Julia devs don't decide that breaking your code is a minor change and ask you to change it

mbauman commented 1 month ago

@aplavin What were the actual methods you needed and found to be missing? Were they in LinearAlgebra or in your package? I think the best path forward here is to define those as f(Q::AbstractQ) = f(Matrix(Q)). And it would be great to update QR's docs to ensure it doesn't talk about Q as a "matrix".

This was identified as a breaking change right from the get-go and its pros were very deliberately weighed against the estimated costs of the breakage. It's totally fair that you might weigh those pros and cons differently.

aplavin commented 1 month ago

What were the actual methods you needed and found to be missing? Were they in LinearAlgebra or in your package?

From a cursory look, I saw the immediate error happen in three cases:

passing qr().Q into a custom function specialized to AbstractMatrix
puttingqr().Q into a custom struct specialized to AbstractArray
and puttingqr().Q into a KeyedArray (from AxisKeys.jl) that is specialized to AbstractArray

Most likely, there are more downstream functions that also require AbstractMatrix, didn't check that. ::AbstractArray and ::AbstractMatrix constraints are very common in the ecosystem.

I think the best path forward here is to define those as f(Q::AbstractQ) = f(Matrix(Q))

Sounds like "Julia minor versions are non-breaking, you just need to update your code and it will continue working" :)

I believe setting the expectations correctly is important: Julia advertises itself as backwards-compatible, with package compatibility additionally checked by PkgEval. In practice, even when breakage is caught by PkgEval (ie far from always), it can easily be dismissed as a minor change. Are you open to clarifying the docs then, along the lines of "limited breaking changes often happen in minor Julia versions; users are encouraged to test their codes regularly and introduce relevant updates in order to use newer versions"?

jishnub commented 1 month ago

Perhaps it's better to distinguish between julia (Core, Base etc.) and the stdlibs. If LinearAlgebra were to be excised and released independently, this would be an issue with a package rather than with julia.

aplavin commented 1 month ago

Of course, if LinearAlgebra wasn't shipped with the base Julia install, and had a breaking change in a minor version, I would've opened the issue in their repo. IME this is quite rare for packages: many tend to lean the other way and release major versions even for almost-not-breaking changes, actually following semver.

dkarrasch commented 1 month ago

I take the blame. Yes, this was a breaking change, no doubt. Just to recap, what was done/considered to justify it:

At the time, I made a huge effort to prevent breakage of packages, and IIRC it turned out that there were very few cases where package code assumed AbstractQ <: AbstractMatrix, if any. That assumption was typically relied on in tests, like comparing two AbstractQs with each other, which fell back to elementwise comparison.
Removing the subtyping allowed for a great amount of method ambiguity reduction, hence method removal, and finally significantly improved latency.
(Almost) all methods were retained, that play nicely with AbstractQ objects. What broke were ("working", admittedly) methods, that, especially for large AbstractQs, were inevitably super-slow, because one would compute - via getindex - each component of the matrix representation at the time, by performing an algorithm, that would first compute the whole column, and then pick one element, potentially just to recompute the same column, and then pick the next component. That is, in cases where one would traverse one way or another the matrix representation, one will always be better off computing the matrix representation first, with tr being one of my favorite examples!

All of that has nothing to do with semver and doesn't justify formally breaking it, so I plead guilty. But I hope you agree that the "justifying" considerations have heavy weight. If I can help with anything, let me know.

aplavin commented 1 month ago

At the time, I made a huge effort to prevent breakage of packages, and IIRC it turned out that there were very few cases where package code assumed AbstractQ <: AbstractMatrix, if any.

This kind of code is much more likely to be found in end-user code, not in packages. A package defining f(::AbstractMatrix) isn't really expected to test it on "all" AbstractMatrix subtypes available. And the docs of qr() clearly refer to the components as matrices, so users should reasonably expect to be able to pass them to f().

Removing the subtyping allowed for a great amount of method ambiguity reduction, hence method removal, and finally significantly improved latency.

Sure, removing subtyping generally reduces the number of methods applicable, and is expected to improve latency. But isn't this kind of dispatch one of the major Julia selling points?

What broke were ("working", admittedly) methods, that, especially for large AbstractQs, were inevitably super-slow,

A lot of matrix decompositions aren't done for performance, and even slow but correct generic implementations are completely fine in those cases.

All the above points motivate qr().Q::AbstractMatrix independently of the behavior in previous versions. But the most objective point is of course the breaking change, that Julia promises to avoid. For me it feels extremely strange to make a breaking change knowingly, motivated by some design improvement (while the old behavior was totally reasonable and correct). I honestly struggle to see how one would weight "fixing suboptimal performance" higher than "keeping existing code working"...

There are so many ways to steer performance-sensitive QR decomposition users to the new implementation without breaking the existing code! Just some examples (names are bikeschedable):

deprecate qr().Q, add qr().Q_fast
make qr().Q an eager Matrix, add qr().Q_fast
deprecate qr(), add qr_fast()

aplavin commented 3 weeks ago

Noticed yet another example of perfectly reasonable calculation that should work, and worked without any special effort in 1.9-, but got broken by this qr change:

julia> using LinearAlgebra, StructArrays

julia> qr(rand(3,3)).Q * StructArray([1+2im, 3, 4])
# 1.9-:
3-element StructArray(::Vector{Float64}, ::Vector{Float64}) with eltype ComplexF64:
 -4.8444364108512055 - 0.9826988305649018im
  1.0784776480572889 - 1.0628267141640224im
  1.1697528900840188 - 1.3801095550954214im

# 1.10+:
ERROR: MethodError: no method matching lmul!(::LinearAlgebra.QRCompactWYQ{ComplexF64, Matrix{…}, Matrix{…}}, ::StructVector{ComplexF64, @NamedTuple{…}, Int64})

jishnub commented 3 weeks ago

This is clearly a bug that needs to be patched

dkarrasch commented 3 weeks ago

This is an interesting case. The reason why this fails on v1.10 is because we don't have a generic lmul! for QRCompactWYQ. The reason why it used to work is because it fell back to generic_matvecmul!, which explicitly ignores the structure of the QRCompactWYQ object and takes the performance-wise worst path possible. Perhaps we need to translate LAPACK.gemqrt!, just like someone translated LAPACK.ormqr!.

JuliaLang / LinearAlgebra.jl

in 1.10+, matrix decomposition components aren't always AbstractMatrix and don't behave like them #1097