Open octonion opened 6 years ago
I am not sure whether a Julia script executed from the terminal is in the "global" scope. What happens when you wrap Q, r, p
in a function also?
Declaring Q to be a constant makes no differences in performance, nor elicits any error messages, so it appears it's local. But this is really outside my expertise.
does this change preserve accuracy?
does this change preserve accuracy?
The changes shown are for integers.
The two implementations have exactly the same number of multiplications so it's interesting that one of them is faster. I think the difference has to something to do with Q
and the order of the multiplications which isn't the same in the two implementations.
The implementation in Base
computes (Q^(2^8))^390625
whereas your implementation computes (Q^390625)^(2^8)
. Now notice that
julia> Q256 = Q^(2^8);
julia> @btime Q*Q;
39.540 μs (1255 allocations: 23.66 KiB)
julia> @btime Q256*Q256;
51.709 μs (1333 allocations: 36.92 KiB)
So multiplication with Q256
is more expensive. In your implementation, the non-power-of-two part of the power computation will mainly consist of Q
multiplications whereas the Base
implementation will multiply with Q256
. This cost difference is specific to your problem and it probably isn't too hard to come up with an example that goes in the opposite direction so I don't think the example here is sufficient evidence to change the implementation in Base
.
If there's some way to predict which way will be faster, we could use a polyalgorithm, but I'm not sure the additional complexity is worth it. That seems more appropriate for a specialized package.
In the examples I've looked at this approach appears to be faster if the exponent is even (with speed improvement increasing with divisibility by 2), but no slower if the exponent is odd (or worst case, of the form 2^n-1). Under what circumstances could be expect it to to be slower?
My thought was that it would be possible to come up with cases where Q*A
would generally be expensive relative to Q256*A
. However, I haven't been able to construct such an example. I guess that most often, the magnitude of the elements of Q256
will be larger than the elements of Q
so for BigFloat
it is more costly to multiply with Q256
. However, it would be great to better understand the conditions under which changing the ordering of the multiplications is beneficial.
However, it would be great to better understand the conditions under which changing the ordering of the multiplications is beneficial.
Given that it seems to be better to change the order in all circumstances anyone can cook up, can't we change it first and consider "understanding the conditions" as an interesting research project?
Given that it seems to be better to change the order in all circumstances anyone can cook up, can't we change it first and consider "understanding the conditions" as an interesting research project?
The current version of power_by_squaring
is used for all kinds of types and is pretty old. So far, we have a fairly specific example suggesting that for Matrix{BigFloat}
a different version is faster. As a general rule, I don't think we should change generic algorithms based on single examples of types for which the algorithm isn't optimal. Next week/month/year somebody might find that the current version is better for their input type so I think it would be good to understand the problem better before changing the current version.
But we could specialize for Matrix{BigFloat}
quite easily...
Do we want to do add this specialization?
I've been looking at the matrix power functions in SymPy and Ruby recently - in particular, wondering why SymPy is so slow for very large power compared to Ruby (which is surprisingly fast). Notably, Ruby is also faster than Julia for big integer matrix powers.
Replacing Julia's matrix power with the recursive function code copied from SymPy actually makes it faster. It would seem that a proper iterative function should be even faster.
Base:
Output:
Modified:
Output: