TuringLang / AdvancedVI.jl

Implementation of variational Bayes inference algorithms
http://turinglang.org/AdvancedVI.jl/
MIT License
83 stars 18 forks source link

Rethinking AdvancedVI #24

Closed theogf closed 5 months ago

theogf commented 3 years ago

Alright! It's time to seriously take care of AdvancedVI :D

Here are some of the things we talked about in the meeting back in October:

And here are some more personal points (disclaimer: I will be happy to take care of these different points)

Red-Portal commented 2 years ago

Hi, is there any update on a complete rewrite of AdvancedVI? Or even an expected time frame for release?

theogf commented 2 years ago

Hey, there is no update and I would say that this has gone stale. I don't have the bandwidth for it anymore and neither does @torfjelde (I guess), so unless someone takes over...

Red-Portal commented 2 years ago

Hi @theogf , that's sad news. Then at the given moment, the VI ecosystem of Turing will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl is there any timeline on that?

theogf commented 2 years ago

I really hope @torfjelde has the time for it (we haven't talked in a while). If the package becomes easier to work with I would definitely be happy to add a couple of algorithms like SVGD and others. But I generally think that a revamping is very necessary. The ML ecosystem evolved a lot and there are now new solutions like ParametersHandling.jl for problems we had here.

Red-Portal commented 2 years ago

Is there a straightforward way to deal with the covariance of a full-rank multivariate normal variational family though? I have been using AdvancedVI.jl as the basis of one of my recent research projects, but couldn't come up with a way to elegantly unpack/repack the parameters of the covariance. I think taking gradients independently for each symbolic variable a la Flux.jl could be a solution. Any thoughts on this?

theogf commented 2 years ago

You should have a look at ParameterHandling.jl and the positive_definite function. However, there is no specific optimization for VI, but that's a topic on its own!

Red-Portal commented 2 years ago

@theogf That looks great. I would really like to know about the future/current state of Turing.jl's model API before doing anything though.

Red-Portal commented 2 years ago

I will start pursuing a PhD starting this Fall, and this might give me some bandwidth to work full-time on AdvancedVI.jl I personally think there is a lot of potential for it being a research platform for cutting edge VI research. There are some things that we are missing and need some major work.

@theogf could you list of changes that you planned to introduce into AdvancedVI? I might be able to pick them up at some point.

torfjelde commented 2 years ago

Hey! I'm back now; been away for the past 4 months, so sorry for not being responsive here.

Then at the given moment, the VI ecosystem of Turing will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl is there any timeline on that?

So it depends on what we're talking about here.

The work I'm doing on the model-side of Turing.jl will be very useful for any interaction AdvancedVI.jl wants to have with Turing.jl-models, e.g. perform VI on a Turing.jl model, use a Turing.jl model to define a variational approximation, etc. But solely for AdvancedVI.jl, i.e. ignoring any relation to the rest of Turing.jl-ecosystem, we're still not happy with what we have set-up this far; the general API needs to improve, as partially outlined by @theogf above. There are also some significant improvements in the ecosystem that we might want to take advantage of here in AdvancedVI.jl:

And so on.

It requires a bit more thought and outlining what we want here though, but I'm keen on getting something rolling now!:)

Red-Portal commented 2 years ago

Hi @torfjelde , nice to have you back. If you haven't noticed, I'm one of the guys that was on the Turing.jl salespitch at the University of Liverpool.

Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here. Its algorithmic structure is quite different from BBVI/MCVI such that I don't see good abstraction opportunities. And given that we'll not see a shortage of variational particle methods any time soon, I think it will be good to have a separate package like AdvancedParticles.jl or something.

theogf commented 2 years ago

Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here.

I don't agree, the representation is different but just as relevant.

Even if we move it to a different package, we would still need a common API. So it's probably preferable to think of this in one package before starting to split things up.

Red-Portal commented 2 years ago

@theogf Given that you already have #25 open, do you plan on coming back to #25 or how should we attack rewriting AdvancedVI?

theogf commented 2 years ago

No I think it's probably better to start back from scratch, you can eventually take ideas from there if you want

Red-Portal commented 2 years ago

Okay. Thanks, @theogf @torfjelde the discussions were really helpful.

Red-Portal commented 1 year ago

Hi @torfjelde , I'm thinking about how to restructure the overall project.

I'm thinking to restructure the project as:

Currently, AdvancedVI.jl has a separate notion of a variational objective (implemented in objectives.jl) and an algorithm (implemented in advi.jl; I'm proposing to change this terminology into esimator) for estimating the objective's gradient, but I don't think this distinction is necessary. After all, most of the gradient estimators proposed in the literature target specific objectives, so I think an objective should be an attribute of an estimator rather its own object.

Under algorithms, I'm planning to put higher-level algorithms that utilize the output of the estimators. For example, stochastic varianced-reduced gradient descent could be one, or methods of combining the output of multiple estimators like [1,2] could also be considered.

For diagonstics, I'm thinking of the various VI-specific diagnostics that have been proposed over the years, like the ones in [3], and the R-hat diagnostics [4]. Though [4] would need an online version of R-hat. I think I saw some heresay about this but not sure what happened on that front.

[1] "A Rule for Gradient Estimator Selection, with an Application to Variational Inference," https://arxiv.org/abs/1911.01894 [2] "Using Large Ensembles of Control Variates for Variational Inference," https://arxiv.org/abs/1810.12482 [3] "Validated Variational Inference via Practical Posterior Error Bounds," http://proceedings.mlr.press/v108/huggins20a.html [4] "Robust, Accurate Stochastic Optimization for Variational Inference," https://arxiv.org/abs/2009.00666

yebai commented 1 year ago

Hi @Red-Portal, it looks like a sensible plan. I suggest we keep things simple until there is a genuine need for generalisation. For example, estimators and algorithms can be kept the same if they are always coupled in practice.

Some diagnostics are definitely helpful, but this is likely a challenging area as we don't have good ways of checking convergence from the VI approximation to the true target. One way is to run expensive MCMC simulations and compute the divergence between VI approximation and MCMC samples. But we don't have guarantees that MCMC converges either.

For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.

Red-Portal commented 1 year ago

Hi @yebai ,

For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.

Absolutely! With the talk around diagnostics and algorithms, I wanted to illustrate the potential uses of the new structure. The actual content would be a long-term goal, if feasible.

I'll start with refactoring the existing functionalities.

Red-Portal commented 1 year ago

Hi @yebai @torfjelde ,

What is the current policy about LogDensityProblems.jl ? It seems AdvancedHMC.jl chose to go with it. Should AdvancedVI.jl also follow suite?

yebai commented 1 year ago

That sounds good.