Closed akucukelbir closed 8 years ago
@mariru is working on MAP, which is another case where we don't necessarily need this score vs reparam dichotomy. We also need to think about how the class should later incorporate sampling methods (e.g., do we just treat is an "optimization"?).
how about having a hierarchical method
structure, like in Stan?
You mean for specifying the inference method? E.g., Inference(method="MFVI")
?
hmm. now that i think about it, i'm not sure.
perhaps we have some sort of added hierarchy within Inference
.
i don't know how to communicate this so bear with me:
+-----------+
| Inference +------------+------------------+
+-----+-----+ | |
| | |
| | |
+------+------+ +------+-------+ +-----+------+
| Variational | | Optimization | | Sampling |
+------+------+ +--------------+ +------------+
|
|
+
MFVI/KLpq/etc.
so the reparam/score loss stuff happens at the variational
level (in its implementation of run
). perhaps Inference
doesn't even need to implement run
anymore.
does that make sense?
I like the ASCII! This makes sense. I would also put optimization inside variational.
optimization with the score function estimator? is that useful?
On Sun, Mar 13, 2016 at 12:10 PM, Dustin Tran notifications@github.com wrote:
I like the ASCII! This makes sense. I would also put optimization inside variational.
— Reply to this email directly or view it on GitHub https://github.com/Blei-Lab/blackbox/issues/38#issuecomment-195988217.
For example, MAP (and by extension, MLE) is variational inference with a point mass variational family. This is how Maja is currently implementing it.
what does sampling from a point mass mean?
the way i view it: variational inference in this library is basically (by choice) based on stochastic optimization techniques.
MAP and MLE does not need to be based on stochastic optimization. so doesn't it make more sense to separate?
(i could be missing something here.)
On Sun, Mar 13, 2016 at 2:40 PM, Dustin Tran notifications@github.com wrote:
For example, MAP (and by extension, MLE) is variational inference with a point mass variational family. This is how Maja is currently implementing it.
— Reply to this email directly or view it on GitHub https://github.com/Blei-Lab/blackbox/issues/38#issuecomment-196017335.
"sampling" for the point mass means simply returning its value.
If you checkout branch feature/map I implemented a variational family PMGaussian for modeling unconstrained parameters using a point estimate. It should probably get a better name. But I wanted to make the distinction that like MFGaussian the transform for the mean parameter is the identity.
So I think it can be useful to have run() in the variational/optimization parent class but then have methods within run() that get overwritten by the child classes: e.g. call build_loss() within run() in the parent class and then overwrite build_loss() in the child class to call one of build_score_loss() or build_reparamloss() or build"other"_loss(). These method specific loss functions can be implemented in the parent class or if a modification is needed they can also be overwritten for a specific inference method.
Yup that's a great idea. So right now, Inference
would have build_loss():
which returns raise NotImplementedError()
. Then MFVI
would write build_loss()
as an if-else chain and returns the score or reparam loss. For KLpq
, it would just be a single loss because there is no reparameterization gradient. For MAP
, it can just return log p(x,z).
so what's the full spec here? and what would be the best way of making this change? (we should be considerate of stuff happening in other branches.)
class Inference:
def __init__(self, model, data):
class MonteCarlo(Inference):
def __init__(self, *args, **kwargs):
Inference.__init__(self, *args, **kwargs)
# not sure what will go here
class VariationalInference(Inference):
def __init__(self, model, variational, data):
Inference.__init__(self, model, data)
self.variational = variational
def run():
def initialize():
def update():
def build_loss():
def print_progress():
class MFVI(VariationalInference):
def __init__(self, *args, **kwargs):
VariationalInference.__init__(self, *args, **kwargs)
def build_loss():
if ...:
return build_score_loss()
else:
return build_reparam_loss()
def build_score_loss():
def build_reparam_loss():
class KLpq(VariationalInference):
def __init__(self, *args, **kwargs):
VariationalInference.__init__(self, *args, **kwargs)
def build_loss():
class MAP(VariationalInference):
def __init__(self, model, data):
variational = PointMass(...)
VariationalInference.__init__(self, model, variational,data)
def build_loss():
As for how to implement this, I suggest we do this broad refactor as early stage as possible to avoid incurring debt. So we write this in a branch and then individually deal with any merge conflicts to each branch once the pull request is made.
very nice.
wouldn't it be more flexible to have
class MAP(Inference):
again, i'm not entirely following why we want to go with this PointMass
approach. is it to reduce some reimplementation of some code somehow?
By doing variational inference with a pointmass, you are reusing the gradient descent routine from run() in (variational) inference. Plus you can use the PointMass objects to encode constraints in the parameter space but then still do the same optimization as defined in run() in the unconstrained space.
On Mon, Mar 14, 2016 at 2:06 PM Alp Kucukelbir notifications@github.com wrote:
very nice.
wouldn't it be more flexible to have
class MAP(Inference):
again, i'm not entirely following why we want to go with this PointMass approach. is it to reduce some reimplementation of some code somehow?
— Reply to this email directly or view it on GitHub https://github.com/Blei-Lab/blackbox/issues/38#issuecomment-196447038.
Broadly, I see inference derived from two paradigms: optimization (variational inference) and sampling (Monte Carlo methods). The reason to include techniques such as MLE, MAP, MML, and MPO as part of the variational inference class is for two reasons:
update()
, print progress()
of the iteration and loss function's value, initialize()
, and a general wrapper of all these objects in run()
. Any of these methods can overwrite one of the defaults or add onto it.hmm. not to be pedantic here, but i don't think i agree with either point. (also, I don't know what MPO is.)
a broader point of 1 is i guess this: did we decide to frame blackbox
as
a Bayesian toolbox?
i also didn't follow some of maja's comments. perhaps this is easier to figure out over coffee :)
On Mon, Mar 14, 2016 at 2:33 PM, Dustin Tran notifications@github.com wrote:
Broadly, I see inference derived from two paradigms: optimization (variational inference) and sampling (Monte Carlo methods). The reason to include techniques such as MLE, MAP, MML, and MPO as part of the variational inference class is for two reasons:
- Conceptually. I personally view variational inference as an umbrella term for any posterior inference method that is formulated as an optimization problem. All these estimation techniques are crude approximate methods based on the mode. Viewing them as approximations justifies and makes clear the use case for other approximations, such as KL(p||q). (E.g., I don't think it's reasonable to distinguish between inference via approximate posterior means and inference via exact or approximate posterior modes.)
- Practically. All optimization-based methods share many defaults: the same optimization routine (e.g., learning rate, gradient descent method) using update(), print progress() of the iteration and loss function's value, initialize(), and a general wrapper of all these objects in run(). Any of these methods can overwrite one of the defaults or add onto it.
— Reply to this email directly or view it on GitHub https://github.com/Blei-Lab/blackbox/issues/38#issuecomment-196459617.
Well, let's agree to disagree then. :)
MPO: marginal posterior optimization
All optimization methods default to gradient descent (data subsampling is optional). latent variable sampling is currently used, e.g., in MFVI and KLpq, but it's not a necessary distinction. for example, we ideally would have coordinate ascent MFVI if someone wrote down a exponential family graphical model with VIBES-like metadata. (@heywhoah and I are interested in this.)
agree to disagree? what kind of strange proposal is that? :)
let's chat in person. i think i'm missing some things here. ( e.g. preferring coordinate ascent? much strangeness abound :) )
I wrote it in the MAP branch. Here's what it looks like: https://github.com/Blei-Lab/blackbox/blob/af3f0528fd116be3dbcfc6d3871ac9119648abce/blackbox/inferences.py
nice work! (i'm not saying that what you and maja propose won't work btw.)
okay, let's discuss today if you both ( @dustinvtran @mariru ) are around!
we currently default to the reparameterization gradient if the
Variational
class implementsreparam
however, if the
Inference
class does not support reparameterization gradients (e.g.KLpq
) then it doesn't matter whether theVariational
class implements it or not.