JuliaLang / LinearAlgebra.jl

Julia Linear Algebra standard library
Other
13 stars 0 forks source link

Should `copy_transpose!` also transpose elements? #1033

Closed maltezfaria closed 1 year ago

maltezfaria commented 1 year ago

A while back I ran into this issue with StaticArrays:

using StaticArrays

A = @SMatrix rand(3, 3)
B = @SMatrix rand(3, 3)

blockA = fill(A, 3, 2)
blockB = fill(B, 3, 2)

val1 = blockA * transpose(blockB)
val2 = blockA * collect(transpose(blockB))

val1 ≈ val2 # false, by a lot

Recently, this came back to bite me again, and I managed to narrow down the issue to this chunk in the transpose.jl file.

for jsrc in jr_src
    jdest = first(jr_dest)
    for isrc in ir_src
        B[idest,jdest] = A[isrc,jsrc]
        jdest += step(jr_dest)
    end
    idest += step(ir_dest)
end

Replacing B[idest,jdest] = A[isrc,jsrc] by B[idest,jdest] = transpose(A[isrc,jsrc]) seems to fix the StaticArray bug, which is related to the copy_transpose! not transposing its elements. Since the transpose(x::Number) = x, this change should not break the scalar version, but I am not sure if there are further implications of such a change.

dkarrasch commented 1 year ago

Transpose and adjoint act recursively, and I'd think that this should be handled consistently. So, what you found looks like a bug to me. OTOH, without the dependency on StaticArrays, this seems to work. What exactly is the difference then?

maltezfaria commented 1 year ago

The difference is that a check is made on whether the element types are of bitstype in order to compute an appropriate tile size:

https://github.com/JuliaLang/julia/blob/b5a531aa25a0a21e2b235bbef6f0cefccfe8c638/stdlib/LinearAlgebra/src/matmul.jl#L805

For a matrix of non bits type, the code goes down a different branch, where it takes the transpose of the individual elements, e.g.:

https://github.com/JuliaLang/julia/blob/b5a531aa25a0a21e2b235bbef6f0cefccfe8c638/stdlib/LinearAlgebra/src/matmul.jl#L882

maltezfaria commented 1 year ago

Gentle bump.

I think this is indeed a bug. Would be happy to submit a PR if it may help expedite things (but the fix is a single word).

dkarrasch commented 1 year ago

The function that you're referring to is getting a big overhaul in JuliaLang/julia#52038. Could you perhaps check that one out and see if the issue is still present? Besides, as I said, I think the function should transpose elements, and a PR (including a test) would be most welcome!

maltezfaria commented 1 year ago

Thanks for the reference! I can confirm that JuliaLang/julia#52038 fixes the problem I encountered with the generic matmul as it avoids the (still buggy) LinearAlgebra.copy_transpose!:

function test_copy_transpose!(T)
    v = rand(T)
    A = fill(v,2,2)
    At = collect(transpose(A))
    B = similar(A)
    LinearAlgebra.copy_transpose!(B,axes(B,1),axes(B,2),A,axes(A,1),axes(A,2))
    B ≈ At
end

test_copy_transpose!(Float64) # true
test_copy_transpose!(ComplexF64) # true
test_copy_transpose!(SMatrix{2,2,Float64,4}) # false

As already mentioned, the issue is that copy transpose is not recursive.

maltezfaria commented 1 year ago

I noticed that copy_transpose! is not used very often in LinearAlgebra, but FWIW the "fix" is simply to transpose the elements during the copy:


function copy_transpose!(B::AbstractVecOrMat, ir_dest::AbstractRange{Int}, jr_dest::AbstractRange{Int},
                         A::AbstractVecOrMat, ir_src::AbstractRange{Int}, jr_src::AbstractRange{Int})
    if length(ir_dest) != length(jr_src)
        throw(ArgumentError(LazyString("source and destination must have same size (got ",
                                   length(jr_src)," and ",length(ir_dest),")")))
    end
    if length(jr_dest) != length(ir_src)
        throw(ArgumentError(LazyString("source and destination must have same size (got ",
                                   length(ir_src)," and ",length(jr_dest),")")))
    end
    @boundscheck checkbounds(B, ir_dest, jr_dest)
    @boundscheck checkbounds(A, ir_src, jr_src)
    idest = first(ir_dest)
    for jsrc in jr_src
        jdest = first(jr_dest)
        for isrc in ir_src
            B[idest,jdest] = transpose(A[isrc,jsrc]) # <-- recursive transposition
            jdest += step(jr_dest)
        end
        idest += step(ir_dest)
    end
    return B
end

I think the reason this is a corner case not caught earlier is that copy_transpose! does not get called in LinearAlgebra unless the entries of the array are of bitstype. So you need a bitstype whose transpose is not itself, such as an SMatrix.

In any case, thanks for the work in making generic_matmatmul faster!