Profiling - Githubissues

gdalle commented 1 year ago

While the issues I raised for memory allocations are good training, their solutions don't seem to bring large gains. To see where our efforts should concentrate, you will need to profile the code. Care to share the results?

borisblagov commented 1 year ago

Okay, so it took me a while and I ran into this bug , which has the following temporary fix but I finally managed to run the profiler and honestly, I am not really sure how to work with it yet. After reading this I understand that the lack of orange and red colors is overall a good thing.

I added the profile.html file. I am not sure whether I can embed it here in some form.

If I understand it correctly, I spend about 70% of the time in genBeta! but I am not sure what to make of it.

gdalle commented 1 year ago

Note that the VSCode extension is probably a nicer interface to profiling, at least in my opinion

gdalle commented 1 year ago

After reading this I understand that the lack of orange and red colors is overall a good thing.

Indeed but you do have a lot of orange/yellow, which denotes memory allocations. In particular, it seems genBeta! calls inv on a matrix, which is very expensive. Can you get away with solving a linear system instead?

borisblagov commented 1 year ago

You mean this?

I thought I was using the VScode extention, am I not?

gdalle commented 1 year ago

I thought I was using the VScode extention, am I not?

Apparently you are! But in that case you don't need all the funky business with ProfileView.jl, you only need the @profview macro

gdalle commented 1 year ago

You mean this?

Yes, that is an inverse computation, which is costly and not always needed. For instance, if you compute $x = A^{-1} b$, it's often better to solve $Ax=b$

borisblagov commented 1 year ago

I thought I was using the VScode extention, am I not?

Apparently you are! But in that case you don't need all the funky business with ProfileView.jl, you only need the @profview macro

Now I am utterly confused. I am getting this graph by typing in @profview include("MainNewB.jl") 😄 but for that to work (otherwise I got a white screen), I did "import ProfileView.

borisblagov commented 1 year ago

You mean this?

Yes, that is an inverse computation, which is costly and not always needed. For instance, if you compute x=A−1b, it's often better to solve Ax=b

Yes, I have invhere

    invSig = Sigma_prior^-1
    V = (invSig + sig2_d_vec[1]^(-1)*(X'*X))^-1
    C = cholesky(Hermitian(V))

I can think about dealing with this. Sigma_prior is a fixed matrix and never changes, so this inversion (this is currently a scalar) can be saved all the 7000 times. Regarding V, I need the precision matrix here only for its factorization later. There were some trick where I didn't need the inverse for that but I have to check.

In real code (this is a teaching example), in the sig2_d_vec[1]^(-1)*(X'*X) part, sig2_d_vec is, in fact, a huge matrix, say Sigma, so I have Sigma\(X'*X). Btw, X'*X doesn't change so this multiplication can be just done once an fed into the function.

borisblagov commented 1 year ago

I am a bit confused how to read the flame graph. Say this part:

If the lines underneath are "one level down", I spent 70% in gibbs and 9 % in gibbs (the two bars on the top row correspond to 70 and 9), then out of the 70% in gibbs I spent 25+26+9 times in genBeta! and 9% in genSigma!. Is that reading of the profiler correct?

gdalle commented 1 year ago

Now I am utterly confused. I am getting this graph by typing in @profview include("MainNewB.jl") smile but for that to work (otherwise I got a white screen), I did "import ProfileView.

The import shouldn't be necessary. And indeed I have recently been confused by this blank screen, but if you run the line with @profview a second time I found it usually works. https://discourse.julialang.org/t/profiling-in-vscode-shows-blank-tab-on-first-run/100492

gdalle commented 1 year ago

If the lines underneath are "one level down", I spent 70% in gibbs and 9 % in gibbs

No, you spent 70% in a certain line of gibbs and 9% in another line of gibbs. You can see the line numbers by hovering with your mouse over the tiles, and if you do Ctrl+click it takes you to the file in question

borisblagov commented 1 year ago

I see, thanks!

I removed unnecessary repeated computations, namely X'*X, X'*Y, invSig = Sigma_prior^-1 and invSig*Beta_prior, which reduced the time by 30%!

new genBeta!

    V = (invSig + sig2_d_vec[1]^(-1)*Xprim)^-1
    C = cholesky(Hermitian(V)) 
    Beta1 =  V*(invSBetaPr + sig2_d_vec[1]^(-1)*XprimY)
    beta_d .= Beta1 .+ C.L*randn(5,1)

new time:

julia> @btime include("MainNewB.jl");
  27.985 ms (139970 allocations: 49.22 MiB)

old time:

julia> @btime include("MainNewB.jl")
  41.813 ms (160966 allocations: 52.21 MiB)

I am unsure I can reduce this further.

gdalle commented 1 year ago

I am unsure I can reduce this further.

Hard to say without profiling myself, but here are some things I noticed:

V = (X + Y^(-1) * Z)^-1

When you do this, you allocate a new matrix for Y^(-1), then a second one for Y^(-1) * Z, and a third one for X + Y^(-1) * Z, and then finally a fourth one for (X + Y^(-1) * Z)^-1. You could for instance keep temporary storage and use mul! + inv! every time, but that would make the code much harder to read.

sig2_d_vec[1]^(-1)

You compute this twice.

Beta1 = V*(invSBetaPr + sig2_d_vec[1]^(-1)*XprimY)

This is an instance of doing $x = A^{-1}b$ instead of solving $Ax = b$. Not sure there is a way to exploit it in your case because you seem to need the Cholesky factor of $A^{-1}$.

C = cholesky(Hermitian(V))

Apparently there is a way tu mutualize computations between Cholesky decomposition and matrix inversion, but I'm not well-versed in linear algebra to help

https://en.wikipedia.org/wiki/Cholesky_decomposition#Matrix_inversion

borisblagov commented 1 year ago

Apparently there is a way tu mutualize computations between Cholesky decomposition and matrix inversion, but I'm not well-versed in linear algebra to help

Hmm, I know of a method, where given a normal distribution with mean $\mu$ and variance $K^{-1}$ you may draw from it by using the Cholesky factor of $K$, so that you don't have to invert $K$. You do have to ~invert~ solve with the Cholesky factor, which is cheaper (step 2. of the algorithm-the backward substitution). The original paper is this one

The source of the linke in wikipedia your provided sounds similar. This part is just after equation 24 on the second page

I never thought about using those algorithms for small problems, I can surely give it a try.

Not sure there is a way to exploit it in your case because you seem to need the Cholesky factor of $А^-1$

If I use the above trick to not having to compute Vinv, I could also solve $Ax =b$ instead of taking the inverse...

borisblagov commented 1 year ago

Well, that did improve!

julia> @btime include("MainNewB.jl")
  22.101 ms (132948 allocations: 35.15 MiB)

versus

julia> @btime include("MainNewB.jl")
  29.781 ms (153948 allocations: 54.81 MiB)

We started at 80ms, then went to 40ms, then to 30ms, now this.

Here is what I changed

function genBeta2!(beta_d,invSig,sig2_d_vec,Xprim,XprimY,invSBetaPr)
    sig2inv = sig2_d_vec[1]^(-1)
    V = invSig + sig2inv*Xprim
    cholV = cholesky(Hermitian(V))
    beta_d .= V\(invSBetaPr + sig2inv*XprimY) + cholV.L\randn(5,1)
#    Vinv = (V)^-1
 #   C = cholesky(Hermitian(Vinv)) 
#    Beta1 =  Vinv*(invSBetaPr + sig2inv*XprimY)
#    beta_d .= Beta1 .+ C.L*randn(5,1)
end

I am using the aforementioned algorithm of Chan and Jeliazkov (2009) to draw $\beta_d$ using $V$ instead of calculating first the inverse, $V^{-1}$. Calculating the mean Beta1, is now done with backward substitution (if the \ operator does the same thing in Julia as it does in Matlab) as in V\(invSBetaPr + sig2inv*XprimY). I am unsure if V\ is avoidable/unavoidable.

gdalle commented 1 year ago

See, we're getting somewhere :) Time to profile again, what's the most costly part?

borisblagov / Julia_AR4_Bayesian_Regression

Profiling #4