Problem with Matrix Multiplications

usefulhyun commented 7 years ago

Hi, there! First, I show you my example.

BLAS.set_num_threads(1)
block_mat1 = Matrix(1,1)
block_mat2 = Matrix(2,2)
block_mat3 = Matrix(3,3)
block_mat4 = Matrix(4,4)
block_mat1[1,1] = ones(3000,3000)
for i=1:2
  for j=1:2
    block_mat2[i,j] = ones(1500,1500)
  end
end
for i=1:3
  for j=1:3
    block_mat3[i,j] = ones(1000,1000)
   end
end
for i=1:4
    for j=1:4
        block_mat4[i,j] = ones(750, 750)
    end
end

seconds_mat1 = Vector{Float64}()
seconds_mat2 = Vector{Float64}()
seconds_mat3 = Vector{Float64}()
seconds_mat4 = Vector{Float64}()

for  #_=1:10
  push!(seconds_mat1, @elapsed block_mat1*block_mat1)
  push!(seconds_mat2, @elapsed block_mat2*block_mat2)
  push!(seconds_mat3, @elapsed block_mat3*block_mat3)
  push!(seconds_mat4, @elapsed block_mat4*block_mat4)
end
@printf("%.6f\t%.6f\t%.6f\t%.6f\n", median(seconds_mat1), median(seconds_mat2), median(seconds_mat3), median(seconds_mat4))

All block_mat# have the same number of elements and the multiplications also have the same number of computations too. But the execution time of mat1 and mat4 is especially slow and you will know if you block more than 4 in the row and column, it shows you slow execution times. Namely, no partition and more than 16 partitions happen slow execution time.

But....

function mul(a::Matrix, b::Matrix)
        @assert size(a,2) == size(b,1)
        m = size(a,1)
        n = size(b,2)
        d = size(a,2)
        res = Matrix(m, n)
        for i=1:m
                for j=1:n
                        res[i,j] = a[i,1]*b[1,j]
                        for k=2:d
                                res[i,j] += a[i,k]*b[k,j]
                        end
                end
        end
        return res
end

The above mul function is my custom matrix multiplication function between two matrix type. If you use mul function instead of * operator, you will get a proper execution time. Why does this happen? is it a bug? and I have tried type annotated for matrix, but it failed to get a proper execution time.

StefanKarpinski commented 7 years ago

Please post questions to the Julia discourse discussion forum.

JeffBezanson commented 7 years ago

I believe this is due to this line in the generic matrix multiply code:

                    z2 = zero(A[i, 1]*B[1, j] + A[i, 1]*B[1, j])

It's doing the interior matrix multiply 3 times, 2 of which are for computing a zero element.

JeffBezanson commented 7 years ago

The other sizes aren't affected because they use special-case code for 2x2 and 3x3 matrices.

andreasnoack commented 7 years ago

Yeah. We basically assume it is free to compute things like zero(A[i, 1]*B[1, j] + A[i, 1]*B[1, j]) to get the right accumulator type. 2x2 and 3x3 are completely unrolled so there the type computation isn't necessary. Generally, we don't try to be smart about the cases where the elements are mutable. Hence multiplication of arrays of arrays will create a ton of temporaries (and similarly with Matrix{BigFloat})

mschauer commented 7 years ago

We should perhaps wrap this in a function call to be able to overload this.

JuliaLang / julia

Problem with Matrix Multiplications #23268