Certain `SArray` * `Adjoint(SArray)` produces `Array{Float64,2}` instead of `SArray`

zygmuntszpak commented 6 years ago

A product of two StaticArrays results in an allocated array. Here is a minimal working example:

A = @SMatrix rand(9,1)
B = SArray{Tuple{9},Float64,1,9}(1,2,3,4,5,6,7,8,9)
C = adjoint(B)
D = A*C
typeof(D) # Array{Float64,2} instead of a StaticArray.

andyferris commented 5 years ago

Thanks, interesting.

I note you are doing matrix * vector'. I'm curious what you want this for? When I first worked on the vector transpose stuff I didn't really want to even support that as valid linear algebra.

But Julia does tend to be forgiving with regards to the singleton dimensions, we should support this here becase LinearAlgebra does. I guess we just forgot a method for this.

zygmuntszpak commented 5 years ago

I think the only time this occurs is when you have one entity that happens to be a column vector (single column of matrix), and the other a row vector, and you are computing the outer product of these two vectors. This arose when I was implementing a particular cost function. I've provided a small example below. I think I've also figured out which additional dispatch rules to add in order to address this. However, I've stopped short of a pull-request because I'm baffled by the fact that when I am summing the problematic term in my cost function (for which my proposed fix now returns a StaticArray) I am encountering a lot of inexplicable allocations. I'm not sure if this is a different bug, or of it is due to the fact that I haven't done enough in my proposed fix.

The proposed fix involves adding these two lines to matrix_multiply.jl

@inline *(A::StaticMatrix, B::Adjoint{<:Any, <:StaticVector}) = *(reshape(A, Size(Size(A)[1],)), B) 
@inline mul!(dest::StaticVecOrMat, A::StaticMatrix, B::Adjoint{<:Any, <:StaticVector}) = mul!(dest, reshape(A, Size(Size(A)[1],)), B)

Here is an example that appears to solve the problem, but doesn't quite work because of a lot of inexplicable allocations.

using StaticArrays, BenchmarkTools, LinearAlgebra

function hom(v::SVector)
    push(v,1)
end

function T(𝛉::AbstractArray, 𝒞::Tuple{AbstractArray, Vararg{AbstractArray}}, 𝒟::Tuple{AbstractArray, Vararg{AbstractArray}})
    ⊗ = kron
    l = length(𝛉)
    𝐈ₗ = SMatrix{l,l}(1.0I)
    𝐈ₘ =  SMatrix{1,1}(1.0I)
    𝐓 = @SMatrix zeros(l,l)
    N = length(𝒟[1])
    ℳ, ℳʹ = 𝒟
    Λ₁, Λ₂ = 𝒞
    𝚲ₙ = @MMatrix zeros(4,4)
    𝐞₁ = @SMatrix [1.0; 0.0; 0.0]
    𝐞₂ = @SMatrix [0.0; 1.0; 0.0]
    for n = 1: N
        index = SVector(1,2)
        𝚲ₙ[1:2,1:2] .=  Λ₁[n][index,index]
        𝚲ₙ[3:4,3:4] .=  Λ₂[n][index,index]
        𝐦 = hom(ℳ[n])
        𝐦ʹ= hom(ℳʹ[n])
        𝐔ₙ = (𝐦 ⊗ 𝐦ʹ)
        ∂ₓ𝐮ₙ =  [(𝐞₁ ⊗ 𝐦ʹ) (𝐞₂ ⊗ 𝐦ʹ) (𝐦 ⊗ 𝐞₁) (𝐦 ⊗ 𝐞₂)]
        𝐁ₙ =  ∂ₓ𝐮ₙ * 𝚲ₙ * ∂ₓ𝐮ₙ'
        𝚺ₙ = 𝛉' * 𝐁ₙ * 𝛉
        𝚺ₙ⁻¹ = inv(𝚺ₙ)
        𝐓₁ = @SMatrix zeros(Float64,l,l)
        for k = 1:l
            𝐞ₖ = 𝐈ₗ[:,k]
            ∂𝐞ₖ𝚺ₙ = (𝐈ₘ ⊗ 𝐞ₖ') * 𝐁ₙ * (𝐈ₘ ⊗ 𝛉) + (𝐈ₘ ⊗ 𝛉') * 𝐁ₙ * (𝐈ₘ ⊗ 𝐞ₖ)
            # Accumulating the result in 𝐓₁ allocates memory, even though
            # the two terms in the summation are both SArrays.
            𝐓₁ = 𝐓₁ + 𝐔ₙ * 𝚺ₙ⁻¹ * (∂𝐞ₖ𝚺ₙ) * 𝚺ₙ⁻¹ * 𝐔ₙ' * 𝛉 * 𝐞ₖ'
        end
        𝐓 = 𝐓 + 𝐓₁
    end
    𝐓
end

# Some sample data
N = 300
ℳ = [@SVector rand(2) for i = 1:N]
ℳʹ = [@SVector rand(2) for i = 1:N]
Λ₁ =  [SMatrix{3,3}(Matrix(Diagonal([1.0,1.0,0.0]))) for i = 1:length(ℳ)]
Λ₂ =  [SMatrix{3,3}(Matrix(Diagonal([1.0,1.0,0.0]))) for i = 1:length(ℳ)]
F = @SMatrix rand(3,3)
𝒞 = (Λ₁,Λ₂)
𝒟 = (ℳ, ℳʹ)

T(vec(F),𝒞,𝒟)
@btime T(vec($F),$𝒞,$𝒟)  # 682.152 μs (6002 allocations: 3.85 MiB

c42f commented 5 years ago

That's a beautiful example of code-as-mathematics (or is it mathematics-as-code?). The ASCII type names look out of place!

Your problem probably arises in the line

l = length(𝛉)

which very likely leads to uninferrable types for any StaticArray which uses l. If 𝛉 is statically sized that's fairly easily solved using the Length trait.

zygmuntszpak commented 5 years ago

Thank you!

I'm not entirely sure I understand what you mean by using the Length trait. Are you suggesting that I added functions such as:

T(::Length{9}) = @SMatrix zeros(9,9)
T(::Length{3}) = @SMatrix zeros(3,3)

and then define

𝐓 =  T(Length(𝛉))

instead of doing

l = length(𝛉)
𝐓 = @SMatrix zeros(l,l)  ?

I don't think the issue is soley due to the use of l = length(𝛉) and then using l in StaticArray. The reason why I don't think that is the sole issue is because I tried filling in literal values for all my StaticArrays and I still ended up with allocations.

In particular,

using StaticArrays, BenchmarkTools, LinearAlgebra

function hom(v::SVector)
    push(v,1)
end

function T(𝛉::AbstractArray, 𝒞::Tuple{AbstractArray, Vararg{AbstractArray}}, 𝒟::Tuple{AbstractArray, Vararg{AbstractArray}})
    ⊗ = kron
    l = 9
    𝐈ₗ = SMatrix{9,9}(1.0I)
    𝐈ₘ =  SMatrix{1,1}(1.0I)
    𝐓 = @SMatrix zeros(9,9)
    N = length(𝒟[1])
    ℳ, ℳʹ = 𝒟
    Λ₁, Λ₂ = 𝒞
    𝚲ₙ = @MMatrix zeros(4,4)
    𝐞₁ = @SMatrix [1.0; 0.0; 0.0]
    𝐞₂ = @SMatrix [0.0; 1.0; 0.0]
    for n = 1: N
        index = SVector(1,2)
        𝚲ₙ[1:2,1:2] .=  Λ₁[n][index,index]
        𝚲ₙ[3:4,3:4] .=  Λ₂[n][index,index]
        𝐦 = hom(ℳ[n])
        𝐦ʹ= hom(ℳʹ[n])
        𝐔ₙ = (𝐦 ⊗ 𝐦ʹ)
        ∂ₓ𝐮ₙ =  [(𝐞₁ ⊗ 𝐦ʹ) (𝐞₂ ⊗ 𝐦ʹ) (𝐦 ⊗ 𝐞₁) (𝐦 ⊗ 𝐞₂)]
        𝐁ₙ =  ∂ₓ𝐮ₙ * 𝚲ₙ * ∂ₓ𝐮ₙ'
        𝚺ₙ = 𝛉' * 𝐁ₙ * 𝛉
        𝚺ₙ⁻¹ = inv(𝚺ₙ)
        𝐓₁ = @SMatrix zeros(Float64,9,9)
        for k = 1:l
            𝐞ₖ = 𝐈ₗ[:,k]
            ∂𝐞ₖ𝚺ₙ = (𝐈ₘ ⊗ 𝐞ₖ') * 𝐁ₙ * (𝐈ₘ ⊗ 𝛉) + (𝐈ₘ ⊗ 𝛉') * 𝐁ₙ * (𝐈ₘ ⊗ 𝐞ₖ)
            # Accumulating the result in 𝐓₁ allocates memory, even though
            # the two terms in the summation are both SArrays.
            𝐓₁ = 𝐓₁ + 𝐔ₙ * 𝚺ₙ⁻¹ * (∂𝐞ₖ𝚺ₙ) * 𝚺ₙ⁻¹ * 𝐔ₙ' * 𝛉 * 𝐞ₖ'
        end
        𝐓 = 𝐓 + 𝐓₁
    end
    𝐓
end

# Some sample data
N = 300
ℳ = [@SVector rand(2) for i = 1:N]
ℳʹ = [@SVector rand(2) for i = 1:N]
Λ₁ =  [SMatrix{3,3}(Matrix(Diagonal([1.0,1.0,0.0]))) for i = 1:length(ℳ)]
Λ₂ =  [SMatrix{3,3}(Matrix(Diagonal([1.0,1.0,0.0]))) for i = 1:length(ℳ)]
F = @SMatrix rand(3,3)
𝒞 = (Λ₁,Λ₂)
𝒟 = (ℳ, ℳʹ)

T(vec(F),𝒞,𝒟)
@btime T(vec($F),$𝒞,$𝒟)  # 682.152 μs (6002 allocations: 3.85 MiB

It is still allocating even though I don't use l in a StaticArray.

I tried @code_warntype T(vec(F),𝒞,𝒟) which produced an output too long to list here. However, a particular line jumped out with Any:

  1230 ─ %5524 = invoke Base.afoldl(%5506::typeof(*), %5517::SArray{Tuple{9,1},Float64,2,9}, %5475::Adjoint{Float64,SArray{Tuple{9},Float64,1,9}}, _2::SArray{Tuple{9},Float64,1,9}, %5476::Adjoint{Float64,SArray{Tuple{9},Float64,1,9}})::Any

I don't know how to parse that. Perhaps it is indeed related to the fact that I have omitted something from my proposed "fix":

@inline *(A::StaticMatrix, B::Adjoint{<:Any, <:StaticVector}) = *(reshape(A, Size(Size(A)[1],)), B) 
@inline mul!(dest::StaticVecOrMat, A::StaticMatrix, B::Adjoint{<:Any, <:StaticVector}) = mul!(dest, reshape(A, Size(Size(A)[1],)), B)

Evizero commented 5 years ago

not really a solution, but in case this particular issue is an obstacle for some project you are working on:

You can work around this issue (in case you know you have a column vector as a matrix) by using vec.

A = @SMatrix rand(9,1)
B = SArray{Tuple{9},Float64,1,9}(1,2,3,4,5,6,7,8,9)
C = adjoint(B)
D = vec(A)*C
typeof(D) # -> SArray

Evizero commented 5 years ago

this also works: (it restricts * to matrices where the second dimension is of length 1)

@inline (Base.:*)(A::StaticArray{Tuple{N,1},<:Any,2}, B::Adjoint{<:Any,<:StaticVector}) where {N} = vec(A) * B

Evizero commented 5 years ago

to your example. the culprint is Base.afoldl(::typeof(*), ...) according to @code_warntype. so a workaround (additional to the * def above) is to use parenthesis

...
𝐓₁ = 𝐓₁ + (((𝐔ₙ * 𝚺ₙ⁻¹) * (∂𝐞ₖ𝚺ₙ)) * 𝚺ₙ⁻¹) * 𝐔ₙ' * 𝛉 * 𝐞ₖ'
...

julia> @btime T(vec($F),$𝒞,$𝒟)  # 682.152 μs (6002 allocations: 3.85 MiB
  355.293 μs (0 allocations: 0 bytes)
9×9 SArray{Tuple{9,9},Float64,2,81}:
 12.4606   12.7101   9.54674   6.48859   7.60029   3.98854   8.99992  11.6833  0.0
 11.0465   14.5946   9.6956    6.44432   9.02317   4.72583   9.0564   11.8339  0.0
 21.9599   25.7667  18.1746   12.062    15.7895    7.82211  18.386    23.9433  0.0
 10.1492   10.1904   9.10567   7.61871   9.55715   3.85434   7.88378  11.0412  0.0
  9.03448  12.208    9.16016   7.37143  10.5531    4.5322    7.87816  10.9953  0.0
 17.1791   20.2238  16.6752   13.7404   18.7066    7.20513  15.7584   21.9906  0.0
 22.8134   22.3908  20.2338   13.6448   16.1804    8.51447  17.4098   23.062   0.0
 19.7638   26.0583  20.4084   13.0173   18.3174    9.9884   17.2918   22.8682  0.0
 38.6198   44.3518  37.7591   24.2729   31.979    16.2962   34.9386   46.1641  0.0

edit: i somehow completely missed that you diagnosed the afoldl line already.

zygmuntszpak commented 5 years ago

Thank you very much for the wonderful suggestions. Do you think that I have stumbled upon some limitation of the inference engine in Base, or do you think this problem is still at the level of the StaticArrays package? I will try to simplify the code further whilst still keeping this "bug".

Evizero commented 5 years ago

I suspect the issue is the lack of proper inlining the n-ary operator * when Adjoints of static arrays are involved. While it does work for simpler code examples, it doesn't work in your case. Maybe thats because the culprint is a Base method that has no @inline annotation but still gets inlined in simple cases.

For example this hacky fix makes your original code work:

const StaticArrayLike = Union{StaticArray,Adjoint{<:Any,<:StaticArray}}
@inline (Base.:*)(A::StaticArrayLike, B, C, D...) = *(*(A,B), C, D...)
@inline (Base.:*)(A::StaticArrayLike, B::StaticArrayLike, C, D...) = *(*(A,B), C, D...)
@inline (Base.:*)(A::StaticArray{Tuple{N,1},<:Any,2}, B::Adjoint{<:Any,<:StaticVector}) where {N} = vec(A) * B

funny enough it only works if both the first and the second method are added to *.

mateuszbaran commented 4 years ago

This is already fixed in 0.12.4 and #814 fixes more similar cases.

JuliaArrays / StaticArrays.jl

Certain `SArray` * `Adjoint(SArray)` produces `Array{Float64,2}` instead of `SArray` #537