Open gdalle opened 1 year ago
Okay, so it took me a while and I ran into this bug , which has the following temporary fix but I finally managed to run the profiler and honestly, I am not really sure how to work with it yet. After reading this I understand that the lack of orange and red colors is overall a good thing.
I added the profile.html file. I am not sure whether I can embed it here in some form.
If I understand it correctly, I spend about 70% of the time in genBeta! but I am not sure what to make of it.
Note that the VSCode extension is probably a nicer interface to profiling, at least in my opinion
After reading this I understand that the lack of orange and red colors is overall a good thing.
Indeed but you do have a lot of orange/yellow, which denotes memory allocations. In particular, it seems genBeta!
calls inv
on a matrix, which is very expensive. Can you get away with solving a linear system instead?
You mean this?
I thought I was using the VScode extention, am I not?
I thought I was using the VScode extention, am I not?
Apparently you are! But in that case you don't need all the funky business with ProfileView.jl, you only need the @profview
macro
You mean this?
Yes, that is an inverse computation, which is costly and not always needed. For instance, if you compute $x = A^{-1} b$, it's often better to solve $Ax=b$
I thought I was using the VScode extention, am I not?
Apparently you are! But in that case you don't need all the funky business with ProfileView.jl, you only need the
@profview
macro
Now I am utterly confused. I am getting this graph by typing in @profview include("MainNewB.jl")
😄 but for that to work (otherwise I got a white screen), I did "import ProfileView.
You mean this?
Yes, that is an inverse computation, which is costly and not always needed. For instance, if you compute x=A−1b, it's often better to solve Ax=b
Yes, I have inv
here
invSig = Sigma_prior^-1
V = (invSig + sig2_d_vec[1]^(-1)*(X'*X))^-1
C = cholesky(Hermitian(V))
I can think about dealing with this. Sigma_prior
is a fixed matrix and never changes, so this inversion (this is currently a scalar) can be saved all the 7000 times. Regarding V
, I need the precision matrix here only for its factorization later. There were some trick where I didn't need the inverse for that but I have to check.
In real code (this is a teaching example), in the sig2_d_vec[1]^(-1)*(X'*X)
part, sig2_d_vec
is, in fact, a huge matrix, say Sigma,
so I have Sigma\(X'*X)
. Btw, X'*X
doesn't change so this multiplication can be just done once an fed into the function.
I am a bit confused how to read the flame graph. Say this part:
If the lines underneath are "one level down", I spent 70% in gibbs and 9 % in gibbs (the two bars on the top row correspond to 70 and 9), then out of the 70% in gibbs I spent 25+26+9 times in genBeta! and 9% in genSigma!. Is that reading of the profiler correct?
Now I am utterly confused. I am getting this graph by typing in @profview include("MainNewB.jl") smile but for that to work (otherwise I got a white screen), I did "import ProfileView.
The import shouldn't be necessary. And indeed I have recently been confused by this blank screen, but if you run the line with @profview
a second time I found it usually works.
https://discourse.julialang.org/t/profiling-in-vscode-shows-blank-tab-on-first-run/100492
If the lines underneath are "one level down", I spent 70% in gibbs and 9 % in gibbs
No, you spent 70% in a certain line of gibbs
and 9% in another line of gibbs
. You can see the line numbers by hovering with your mouse over the tiles, and if you do Ctrl+click it takes you to the file in question
I see, thanks!
I removed unnecessary repeated computations, namely X'*X
, X'*Y
, invSig = Sigma_prior^-1
and invSig*Beta_prior
, which reduced the time by 30%!
new genBeta!
V = (invSig + sig2_d_vec[1]^(-1)*Xprim)^-1
C = cholesky(Hermitian(V))
Beta1 = V*(invSBetaPr + sig2_d_vec[1]^(-1)*XprimY)
beta_d .= Beta1 .+ C.L*randn(5,1)
new time:
julia> @btime include("MainNewB.jl");
27.985 ms (139970 allocations: 49.22 MiB)
old time:
julia> @btime include("MainNewB.jl")
41.813 ms (160966 allocations: 52.21 MiB)
I am unsure I can reduce this further.
I am unsure I can reduce this further.
Hard to say without profiling myself, but here are some things I noticed:
V = (X + Y^(-1) * Z)^-1
When you do this, you allocate a new matrix for Y^(-1)
, then a second one for Y^(-1) * Z
, and a third one for X + Y^(-1) * Z
, and then finally a fourth one for (X + Y^(-1) * Z)^-1
.
You could for instance keep temporary storage and use mul!
+ inv!
every time, but that would make the code much harder to read.
sig2_d_vec[1]^(-1)
You compute this twice.
Beta1 = V*(invSBetaPr + sig2_d_vec[1]^(-1)*XprimY)
This is an instance of doing $x = A^{-1}b$ instead of solving $Ax = b$. Not sure there is a way to exploit it in your case because you seem to need the Cholesky factor of $A^{-1}$.
C = cholesky(Hermitian(V))
Apparently there is a way tu mutualize computations between Cholesky decomposition and matrix inversion, but I'm not well-versed in linear algebra to help
https://en.wikipedia.org/wiki/Cholesky_decomposition#Matrix_inversion
Apparently there is a way tu mutualize computations between Cholesky decomposition and matrix inversion, but I'm not well-versed in linear algebra to help
Hmm, I know of a method, where given a normal distribution with mean $\mu$ and variance $K^{-1}$ you may draw from it by using the Cholesky factor of $K$, so that you don't have to invert $K$. You do have to ~invert~ solve with the Cholesky factor, which is cheaper (step 2. of the algorithm-the backward substitution). The original paper is this one
The source of the linke in wikipedia your provided sounds similar. This part is just after equation 24 on the second page
I never thought about using those algorithms for small problems, I can surely give it a try.
Not sure there is a way to exploit it in your case because you seem to need the Cholesky factor of $А^-1$
If I use the above trick to not having to compute Vinv, I could also solve $Ax =b$ instead of taking the inverse...
Well, that did improve!
julia> @btime include("MainNewB.jl")
22.101 ms (132948 allocations: 35.15 MiB)
versus
julia> @btime include("MainNewB.jl")
29.781 ms (153948 allocations: 54.81 MiB)
We started at 80ms, then went to 40ms, then to 30ms, now this.
Here is what I changed
function genBeta2!(beta_d,invSig,sig2_d_vec,Xprim,XprimY,invSBetaPr)
sig2inv = sig2_d_vec[1]^(-1)
V = invSig + sig2inv*Xprim
cholV = cholesky(Hermitian(V))
beta_d .= V\(invSBetaPr + sig2inv*XprimY) + cholV.L\randn(5,1)
# Vinv = (V)^-1
# C = cholesky(Hermitian(Vinv))
# Beta1 = Vinv*(invSBetaPr + sig2inv*XprimY)
# beta_d .= Beta1 .+ C.L*randn(5,1)
end
I am using the aforementioned algorithm of Chan and Jeliazkov (2009) to draw $\beta_d$ using $V$ instead of calculating first the inverse, $V^{-1}$. Calculating the mean Beta1
, is now done with backward substitution (if the \
operator does the same thing in Julia as it does in Matlab) as in V\(invSBetaPr + sig2inv*XprimY)
. I am unsure if V\
is avoidable/unavoidable.
See, we're getting somewhere :) Time to profile again, what's the most costly part?
While the issues I raised for memory allocations are good training, their solutions don't seem to bring large gains. To see where our efforts should concentrate, you will need to profile the code. Care to share the results?