Closed maltezfaria closed 1 year ago
Transpose and adjoint act recursively, and I'd think that this should be handled consistently. So, what you found looks like a bug to me. OTOH, without the dependency on StaticArrays, this seems to work. What exactly is the difference then?
The difference is that a check is made on whether the element types are of bitstype
in order to compute an appropriate tile size:
For a matrix of non bits type, the code goes down a different branch, where it takes the transpose of the individual elements, e.g.:
Gentle bump.
I think this is indeed a bug. Would be happy to submit a PR
if it may help expedite things (but the fix is a single word).
The function that you're referring to is getting a big overhaul in JuliaLang/julia#52038. Could you perhaps check that one out and see if the issue is still present? Besides, as I said, I think the function should transpose elements, and a PR (including a test) would be most welcome!
Thanks for the reference! I can confirm that JuliaLang/julia#52038 fixes the problem I encountered with the generic matmul as it avoids the (still buggy) LinearAlgebra.copy_transpose!
:
function test_copy_transpose!(T)
v = rand(T)
A = fill(v,2,2)
At = collect(transpose(A))
B = similar(A)
LinearAlgebra.copy_transpose!(B,axes(B,1),axes(B,2),A,axes(A,1),axes(A,2))
B ≈ At
end
test_copy_transpose!(Float64) # true
test_copy_transpose!(ComplexF64) # true
test_copy_transpose!(SMatrix{2,2,Float64,4}) # false
As already mentioned, the issue is that copy transpose is not recursive.
I noticed that copy_transpose!
is not used very often in LinearAlgebra
, but FWIW the "fix" is simply to transpose the elements during the copy:
function copy_transpose!(B::AbstractVecOrMat, ir_dest::AbstractRange{Int}, jr_dest::AbstractRange{Int},
A::AbstractVecOrMat, ir_src::AbstractRange{Int}, jr_src::AbstractRange{Int})
if length(ir_dest) != length(jr_src)
throw(ArgumentError(LazyString("source and destination must have same size (got ",
length(jr_src)," and ",length(ir_dest),")")))
end
if length(jr_dest) != length(ir_src)
throw(ArgumentError(LazyString("source and destination must have same size (got ",
length(ir_src)," and ",length(jr_dest),")")))
end
@boundscheck checkbounds(B, ir_dest, jr_dest)
@boundscheck checkbounds(A, ir_src, jr_src)
idest = first(ir_dest)
for jsrc in jr_src
jdest = first(jr_dest)
for isrc in ir_src
B[idest,jdest] = transpose(A[isrc,jsrc]) # <-- recursive transposition
jdest += step(jr_dest)
end
idest += step(ir_dest)
end
return B
end
I think the reason this is a corner case not caught earlier is that copy_transpose!
does not get called in LinearAlgebra
unless the entries of the array are of bitstype
. So you need a bitstype
whose transpose is not itself, such as an SMatrix
.
In any case, thanks for the work in making generic_matmatmul
faster!
A while back I ran into this issue with
StaticArrays
:Recently, this came back to bite me again, and I managed to narrow down the issue to this chunk in the
transpose.jl
file.Replacing
B[idest,jdest] = A[isrc,jsrc]
byB[idest,jdest] = transpose(A[isrc,jsrc])
seems to fix theStaticArray
bug, which is related to thecopy_transpose!
not transposing its elements. Since thetranspose(x::Number) = x
, this change should not break thescalar
version, but I am not sure if there are further implications of such a change.