JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.74k stars 5.48k forks source link

Structured matrix operations involving NaN may be inconsistent if a quick return path is taken #55296

Open jishnub opened 3 months ago

jishnub commented 3 months ago

In LinearAlgebra, often there are shortcuts taken depending on whether a matrix has a particular structure (e.g. if it is Diagonal), which lets us evaluate certain functions by acting only on the non-zero elements. This may be either be explicitly evaluating the structure of the matrix and acting only on filled elements, or by forwarding an operation to a parent which is structured. This, however, assumes that the action of the function on the other elements doesn't change the structure, which may not be the case.

E.g.:

julia> B = Bidiagonal(zeros(4), zeros(3), :U)
4×4 Bidiagonal{Float64, Vector{Float64}}:
 0.0  0.0   ⋅    ⋅ 
  ⋅   0.0  0.0   ⋅ 
  ⋅    ⋅   0.0  0.0
  ⋅    ⋅    ⋅   0.0

julia> U = UpperTriangular(B)
4×4 UpperTriangular{Float64, Bidiagonal{Float64, Vector{Float64}}}:
 0.0  0.0  0.0  0.0
  ⋅   0.0  0.0  0.0
  ⋅    ⋅   0.0  0.0
  ⋅    ⋅    ⋅   0.0

julia> U * NaN
4×4 UpperTriangular{Float64, Bidiagonal{Float64, Vector{Float64}}}:
 NaN    NaN      0.0    0.0
    ⋅   NaN    NaN      0.0
    ⋅      ⋅   NaN    NaN
    ⋅      ⋅      ⋅   NaN

julia> VERSION
v"1.12.0-DEV.935"

In this case, the zeros in the upper triangular part of the matrix should be all equivalent, irrespective of the parent. By forwarding the operation to the parent, we are privileging some of the zeros over the others, which seems inconsistent. This behavior is also different if the parent is fully materialized, in which case we obtain:

julia> UpperTriangular(Matrix(B)) * NaN
4×4 Matrix{Float64}:
 NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN
 NaN  NaN  NaN  NaN

In this, even the structural zeros of an UpperTriangular lose their privileged status on being multiplied by NaN.

The result should ideally only depend on the structure of the outermost wrapper, which, in this case, is an UpperTriangular.

The other class of operations is:

julia> T = Tridiagonal(zeros(3), zeros(4), zeros(3))
4×4 Tridiagonal{Float64, Vector{Float64}}:
 0.0  0.0   ⋅    ⋅ 
 0.0  0.0  0.0   ⋅ 
  ⋅   0.0  0.0  0.0
  ⋅    ⋅   0.0  0.0

julia> T ^ NaN
4×4 Tridiagonal{Float64, Vector{Float64}}:
 NaN      0.0     ⋅      ⋅ 
   0.0  NaN      0.0     ⋅ 
    ⋅     0.0  NaN      0.0
    ⋅      ⋅     0.0  NaN

In this, the method determines that T is diagonal, and only acts on the diagonal elements. However, the result seems a bit nonsensical as certain zeros are being privileged over others (although I'm unsure what result to expect here). This extends to fully materialized matrices as well:

julia> zeros(4,4) ^ NaN
4×4 Matrix{Float64}:
 NaN      0.0    0.0    0.0
   0.0  NaN      0.0    0.0
   0.0    0.0  NaN      0.0
   0.0    0.0    0.0  NaN

It seems a bit weird that only the diagonal is filled with NaN, while the other elements aren't touched.

mikmoore commented 3 months ago

Note that on v1.10, UpperTriangular(Matrix(B)) * NaN retains the triangular structure.

I don't take major issue with the incomplete filling of NaNs (like in T^NaN) because the resulting zeros are structural (although here only encoded in the value domain, rather than type domain).

I suppose there is an inconsistency in treating the value 0.0 as a structural ("strong") zero, since they can arise from mere underflow. But it's impossible to decide (at a language level) what zeros should be structural vs incidental, and the user-side use of value zeros for structural is very common so I'd be anxious to break that.

If restoring triangular structure to UpperTriangular(Matrix(B)) * NaN would resolve this issue, then I am in full support. I'm okay with the UpperTriangular{T, Bidiagonal} case keeping its bidiagonal structure. But if this proposes more than that, I can't quite understand what that would be.

jishnub commented 3 months ago

I think retaining the triangular structure for UpperTriangular(::Matrix) * NaN would be a good start. This would treat scalar multiplication as a vector-space scaling, as opposed to an element-wise broadcasting. Such a distinction already seems to exist for Diagonals and other banded matrices:

julia> Diagonal(zeros(2)) * NaN
2×2 Diagonal{Float64, Vector{Float64}}:
 NaN       ⋅ 
    ⋅   NaN

julia> Diagonal(zeros(2)) .* NaN
2×2 Matrix{Float64}:
 NaN  NaN
 NaN  NaN