JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.67k stars 5.48k forks source link

Speeding Up Julia's Matrix Powers #29701

Open octonion opened 6 years ago

octonion commented 6 years ago

I've been looking at the matrix power functions in SymPy and Ruby recently - in particular, wondering why SymPy is so slow for very large power compared to Ruby (which is surprisingly fast). Notably, Ruby is also faster than Julia for big integer matrix powers.

Replacing Julia's matrix power with the recursive function code copied from SymPy actually makes it faster. It would seem that a proper iterative function should be even faster.

Base:

using LinearAlgebra

Q = Array{BigInt,2}([0 0 1 0 2 0; 0 0 0 1 0 0; 1 0 0 1 0 2; 0 2 1 0 0 0; 1 0 0 0 0 0; 0 0 1 0 0 0])

r = Array{BigInt,2}([1 0 0 0 0 0]')

p = sum(Q^100000000*r)

println(length(string(p)))

Output:

time julia main.jl
35950264

real    1m49.730s
user    1m48.743s
sys     0m1.521s

Modified:

using LinearAlgebra

function pow(a,num)
  if (num==1)
    return a
  end
  if (num%2)==1
    return a*pow(a,num-1)
  end
  ret = pow(a,(num//2))
  ret*ret
end

Q = Array{BigInt,2}([0 0 1 0 2 0; 0 0 0 1 0 0; 1 0 0 1 0 2; 0 2 1 0 0 0; 1 0 0 0 0 0; 0 0 1 0 0 0])

r = Array{BigInt,2}([1 0 0 0 0 0]')

p = sum(pow(Q,100000000)*r)

println(length(string(p)))

Output:

35950264

real    1m13.124s
user    1m12.275s
sys     0m1.385s
affans commented 6 years ago

I am not sure whether a Julia script executed from the terminal is in the "global" scope. What happens when you wrap Q, r, p in a function also?

octonion commented 6 years ago

Declaring Q to be a constant makes no differences in performance, nor elicits any error messages, so it appears it's local. But this is really outside my expertise.

oscardssmith commented 6 years ago

does this change preserve accuracy?

pkofod commented 6 years ago

does this change preserve accuracy?

The changes shown are for integers.

andreasnoack commented 6 years ago

The two implementations have exactly the same number of multiplications so it's interesting that one of them is faster. I think the difference has to something to do with Q and the order of the multiplications which isn't the same in the two implementations.

The implementation in Base computes (Q^(2^8))^390625 whereas your implementation computes (Q^390625)^(2^8). Now notice that

julia> Q256 = Q^(2^8);

julia> @btime Q*Q;
  39.540 μs (1255 allocations: 23.66 KiB)

julia> @btime Q256*Q256;
  51.709 μs (1333 allocations: 36.92 KiB)

So multiplication with Q256 is more expensive. In your implementation, the non-power-of-two part of the power computation will mainly consist of Q multiplications whereas the Base implementation will multiply with Q256. This cost difference is specific to your problem and it probably isn't too hard to come up with an example that goes in the opposite direction so I don't think the example here is sufficient evidence to change the implementation in Base.

StefanKarpinski commented 6 years ago

If there's some way to predict which way will be faster, we could use a polyalgorithm, but I'm not sure the additional complexity is worth it. That seems more appropriate for a specialized package.

octonion commented 6 years ago

In the examples I've looked at this approach appears to be faster if the exponent is even (with speed improvement increasing with divisibility by 2), but no slower if the exponent is odd (or worst case, of the form 2^n-1). Under what circumstances could be expect it to to be slower?

andreasnoack commented 5 years ago

My thought was that it would be possible to come up with cases where Q*A would generally be expensive relative to Q256*A. However, I haven't been able to construct such an example. I guess that most often, the magnitude of the elements of Q256 will be larger than the elements of Q so for BigFloat it is more costly to multiply with Q256. However, it would be great to better understand the conditions under which changing the ordering of the multiplications is beneficial.

StefanKarpinski commented 5 years ago

However, it would be great to better understand the conditions under which changing the ordering of the multiplications is beneficial.

Given that it seems to be better to change the order in all circumstances anyone can cook up, can't we change it first and consider "understanding the conditions" as an interesting research project?

andreasnoack commented 5 years ago

Given that it seems to be better to change the order in all circumstances anyone can cook up, can't we change it first and consider "understanding the conditions" as an interesting research project?

The current version of power_by_squaring is used for all kinds of types and is pretty old. So far, we have a fairly specific example suggesting that for Matrix{BigFloat} a different version is faster. As a general rule, I don't think we should change generic algorithms based on single examples of types for which the algorithm isn't optimal. Next week/month/year somebody might find that the current version is better for their input type so I think it would be good to understand the problem better before changing the current version.

StefanKarpinski commented 5 years ago

But we could specialize for Matrix{BigFloat} quite easily...

oscardssmith commented 3 years ago

Do we want to do add this specialization?