Closed Magalame closed 5 years ago
From the sweet sweet benchmarks:
---Benchmarking light operations---
norm
t=4.61(64)·10⁻⁷s σ=9.0% n=391,746
Fast.norm
t=3.08(86)·10⁻⁷s σ=1.7·10% n=499,879
multiplyV
t=4.63(67)·10⁻⁵s σ=9.3% n=15,052
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Fast.multiplyV
t=1.76(47)·10⁻⁵s σ=1.7·10% n=29,158
▀▀▀▀▀▀
transpose
t=9.7(14)·10⁻⁵s σ=9.4% n=9,371
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
ident
t=2.39(41)·10⁻⁶s σ=1.0·10% n=120,027
diag
t=2.41(48)·10⁻⁶s σ=1.2·10% n=118,830
---Benchmarking heavy operations---
multiply
t=4.53(57)·10⁻³s σ=3.6% n=64
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Fast.multiply
t=2.08(40)·10⁻³s σ=5.6% n=128
▀▀▀▀▀▀▀▀▀▀▀
qr
t=5.8(12)·10⁻³s σ=6.0% n=64
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Fast.qr
t=1.64(25)·10⁻³s σ=4.4% n=128
▀▀▀▀▀▀▀▀
---Benchmarking memory consumption---
Case Allocated GCs
norm 16 0
Fast.norm 16 0
multiplyV 848 0
Fast.multiplyV 848 0
transpose 80,080 0
ident 80,928 0
diag 80,080 0
multiply 80,080 0
Fast.multiply 80,144 0
qr 31,845,848 30
Fast.qr 160,248 0
Now I'm wondering why is transpose so slow? Is it because of U.generate
? More benchies to come!
ping @ocramz I think I'll let the transpose
question for later, it might be more interesting to already have this PR merged before moving on!
Hi @Magalame , I'm a bit packed at work these days so haven't had the time to look at this properly, give me a couple more days :)
No problem, good luck!
The current default modules in DLA are optimized for accuracy, with the use of the
Numeric.Sum
module. This is good for certain numeric applications, but other would benefit more from a focus on performance.The new modules are
Statistics.Matrix.Fast
, andStatistics.Matrix.Fast.Algorithms
. Since doing away with theSum
module allows for more general modification, I used a more heavily imperative style, for w finer control over allocation. However a future PR should bring in a more functional style, with generalization added. With more details:for example
norm . column $ m
would not fuse and would actually allocate a column the solution adopted here was to manually fuse operations. The problem is that situations likenorm . multiply m1 . multiply m2 $ m3
would still allocate two matrices. The goal would be to allow for the fusing of the whole operation, that would be pretty amazing. I wonder what would the effect in the case of, say, matrix exponentiation.The whole thing probably just boils down to how to use
Stream
andNew
properly enough to guarantee the stream fusion we want.