Closed theogf closed 5 months ago
Hi, is there any update on a complete rewrite of AdvancedVI? Or even an expected time frame for release?
Hey, there is no update and I would say that this has gone stale. I don't have the bandwidth for it anymore and neither does @torfjelde (I guess), so unless someone takes over...
Hi @theogf , that's sad news. Then at the given moment, the VI ecosystem of Turing
will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl
is there any timeline on that?
I really hope @torfjelde has the time for it (we haven't talked in a while). If the package becomes easier to work with I would definitely be happy to add a couple of algorithms like SVGD and others. But I generally think that a revamping is very necessary. The ML ecosystem evolved a lot and there are now new solutions like ParametersHandling.jl for problems we had here.
Is there a straightforward way to deal with the covariance of a full-rank multivariate normal variational family though? I have been using AdvancedVI.jl
as the basis of one of my recent research projects, but couldn't come up with a way to elegantly unpack/repack the parameters of the covariance. I think taking gradients independently for each symbolic variable a la Flux.jl
could be a solution. Any thoughts on this?
You should have a look at ParameterHandling.jl
and the positive_definite
function. However, there is no specific optimization for VI, but that's a topic on its own!
@theogf That looks great. I would really like to know about the future/current state of Turing.jl's model API before doing anything though.
I will start pursuing a PhD starting this Fall, and this might give me some bandwidth to work full-time on AdvancedVI.jl
I personally think there is a lot of potential for it being a research platform for cutting edge VI research. There are some things that we are missing and need some major work.
@theogf could you list of changes that you planned to introduce into AdvancedVI? I might be able to pick them up at some point.
Hey! I'm back now; been away for the past 4 months, so sorry for not being responsive here.
Then at the given moment, the VI ecosystem of Turing will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl is there any timeline on that?
So it depends on what we're talking about here.
The work I'm doing on the model-side of Turing.jl will be very useful for any interaction AdvancedVI.jl wants to have with Turing.jl-models, e.g. perform VI on a Turing.jl model, use a Turing.jl model to define a variational approximation, etc. But solely for AdvancedVI.jl, i.e. ignoring any relation to the rest of Turing.jl-ecosystem, we're still not happy with what we have set-up this far; the general API needs to improve, as partially outlined by @theogf above. There are also some significant improvements in the ecosystem that we might want to take advantage of here in AdvancedVI.jl:
And so on.
It requires a bit more thought and outlining what we want here though, but I'm keen on getting something rolling now!:)
Hi @torfjelde , nice to have you back. If you haven't noticed, I'm one of the guys that was on the Turing.jl
salespitch at the University of Liverpool.
Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here. Its algorithmic structure is quite different from BBVI/MCVI such that I don't see good abstraction opportunities. And given that we'll not see a shortage of variational particle methods any time soon, I think it will be good to have a separate package like AdvancedParticles.jl
or something.
Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here.
I don't agree, the representation is different but just as relevant.
Even if we move it to a different package, we would still need a common API. So it's probably preferable to think of this in one package before starting to split things up.
@theogf Given that you already have #25 open, do you plan on coming back to #25 or how should we attack rewriting AdvancedVI
?
No I think it's probably better to start back from scratch, you can eventually take ideas from there if you want
Okay. Thanks, @theogf @torfjelde the discussions were really helpful.
Hi @torfjelde , I'm thinking about how to restructure the overall project.
I'm thinking to restructure the project as:
estimators/
diagonstics/
algorithms/
Currently, AdvancedVI.jl
has a separate notion of a variational objective
(implemented in objectives.jl
) and an algorithm
(implemented in advi.jl
; I'm proposing to change this terminology into esimator
) for estimating the objective's gradient, but I don't think this distinction is necessary. After all, most of the gradient estimators proposed in the literature target specific objectives, so I think an objective should be an attribute of an estimator rather its own object.
Under algorithms
, I'm planning to put higher-level algorithms that utilize the output of the estimators. For example, stochastic varianced-reduced gradient descent could be one, or methods of combining the output of multiple estimators like [1,2] could also be considered.
For diagonstics, I'm thinking of the various VI-specific diagnostics that have been proposed over the years, like the ones in [3], and the R-hat diagnostics [4]. Though [4] would need an online version of R-hat. I think I saw some heresay about this but not sure what happened on that front.
[1] "A Rule for Gradient Estimator Selection, with an Application to Variational Inference," https://arxiv.org/abs/1911.01894 [2] "Using Large Ensembles of Control Variates for Variational Inference," https://arxiv.org/abs/1810.12482 [3] "Validated Variational Inference via Practical Posterior Error Bounds," http://proceedings.mlr.press/v108/huggins20a.html [4] "Robust, Accurate Stochastic Optimization for Variational Inference," https://arxiv.org/abs/2009.00666
Hi @Red-Portal, it looks like a sensible plan. I suggest we keep things simple until there is a genuine need for generalisation. For example, estimators and algorithms can be kept the same if they are always coupled in practice.
Some diagnostics are definitely helpful, but this is likely a challenging area as we don't have good ways of checking convergence from the VI approximation to the true target. One way is to run expensive MCMC simulations and compute the divergence between VI approximation and MCMC samples. But we don't have guarantees that MCMC converges either.
For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.
Hi @yebai ,
For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.
Absolutely! With the talk around diagnostics
and algorithms
, I wanted to illustrate the potential uses of the new structure. The actual content would be a long-term goal, if feasible.
I'll start with refactoring the existing functionalities.
Hi @yebai @torfjelde ,
What is the current policy about LogDensityProblems.jl
? It seems AdvancedHMC.jl
chose to go with it. Should AdvancedVI.jl
also follow suite?
That sounds good.
Alright! It's time to seriously take care of AdvancedVI :D
Here are some of the things we talked about in the meeting back in October:
update_q
) or a distribution from which the parameters change.step!
functionAnd here are some more personal points (disclaimer: I will be happy to take care of these different points)
ELBO
approach is good, the ELBO can always be splitted between an entropy term (depending only of the distribution) and an expectation term over the log joint. Most VI methods take advantage of this by computing the entropy gradient analytically (and smartly!), see "Doubly Stochastic Variational Inference" by Titias for instance. My proposition would be to split the gradient into two parts (grad_entropy + grad_expeclog), where one can specialize given the problem.update_q
only makes sense with the current obsolete implementation using distributions with immutable fields likeTuringMvNormal
. See again Titsias using the reparametrization trick.