DiscreteDP Performance - Githubissues

QuantEcon / QuantEcon.jl

Julia implementation of QuantEcon routines

https://quantecon.org/quantecon-jl/

BSD 3-Clause "New" or "Revised" License

506 stars 303 forks source link

DiscreteDP Performance #118

Open MaximilianJHuber opened 8 years ago

MaximilianJHuber commented 8 years ago

DiscreteDP performance might be bad in the Julia version.

For example, this Python code runs 220 times faster (local 2x2.66GHZ, 4 GB machine) than this Julia code. (The task is described here).

Could someone please replicated this performance issue?

MaximilianJHuber commented 8 years ago

Profiling the "solve"-function suggests nothing unusual: bildschirmfoto 2016-07-11 um 19 15 27

BUT: line 319 and 380 in ddp.jl have: vals = ddp.R + ddp.beta * ddp.Q * v but really should be: vals = ddp.R + ddp.beta * (ddp.Q * v). Runtime drops by a factor of 6 and memory allocations drop from 500 MB to 2 MB of this single line!

sglyon commented 8 years ago

Great sleuthing! This makes perfect sense!

Do you mind putting that change in a pull request?

MaximilianJHuber commented 8 years ago

I did not know what that means 30 minutes ago, but I hope I did the right thing!

MaximilianJHuber commented 8 years ago

A new performance test with the altered matrix multiplication yielded: constructor 0.773396 seconds (2.02 k allocations: 466.133 MB, 2.01% gc time) solve 1.922052 seconds (616 allocations: 108.402 MB, 1.30% gc time)

While Python has: constructor CPU times: user 103 ms, sys: 8.35 ms, total: 112 ms solve CPU times: user 56.5 ms, sys: 2.05 ms, total: 58.5 ms

So the issue remains open.

MaximilianJHuber commented 8 years ago

Further profiling points to linear algebra to be the most expensive type of operation, for both PFI: untitled and VFI:

@oyamad I saw you are working on a number of memory/performance issues concerning ddp.jl. May I suggest to make Q sparse throughout ddp.jl?

I played around with the evaluate_policy-function (ddp.jl:451) and having Q sparse (see the task above) leads to a ridiculously high performance gain (factor 22000) and I guess it will outweigh other disadvantages of sparse matrices. Having Q sparse in the bellman_operator!-function (ddp.jl:318) yields a performance boost of factor 25.

robertdkirkby commented 8 years ago

You will probably want to be careful about making Q sparse without first having some kind of check as to whether a sparse matrix is appropriate. I won't claim to know much about sparse matrices but I imagine this is easy enough to check.

There are many economic applications where Q will not be a sparse matrix. Eg., markov has two states, employment and unemployment or recession and boom, and Q is transition between these, as eg. Imrohoroglu (1989), or the 4x4 Q matrix on labour efficiency units used by Castañeda, Díaz-Gímenez & Río-Rull (2003) to match the US lorenz curves for income and wealth inequality.

Equally, for many examples Q is probably going to be the Tauschen method of discretizing an AR(1) and these will be sparse (or close enough) so there are lots of examples where the speed boost @MaximilianJHuber is describing might be gained.

MaximilianJHuber commented 8 years ago

Ok, that is true. The user must decide.

I gave it a try and altered line 51 in ddp from: Q::Array{T,NQ} to Q::AbscractArray{T,NQ} (I know that is quick and dirty, because composite types should have concrete types a fields!) and line 87 from new(R, full(Q), beta, _a_indices, a_indptr) to new(R, Q, beta, _a_indices, a_indptr)

Then the constructor allocates for grid_size=500 only 12 MB instead of 480 MB with a runtime improvement by factor 20 and solve allocates 43 MB instead of 108 MB with a runtime improvement by factor 17.

Are there expert opinions on sparse matrices? Or is the first order of business to redesign ddp.jl in order to gain performance?

oyamad commented 8 years ago

I guess that once we decide on the internal data structure (visit #124 to join the discussion), it should be straightforward to implement methods that preserve the sparsity of the input matrix.