design base class for variational models

dustinvtran commented 8 years ago

What's the right abstraction, and base class methods and members that all variational model classes should share? Further, how do we mix and match them up, so it's not as blocky as "MFGaussian" but can, e.g., be a choice of variational family for each dimension, or specification of a joint distribution.

This choice will be particularly relevant for designing classes for hierarchical variational models, in which you will have some arbitrary stacking of these guys.

dustinvtran commented 8 years ago

Ultimately to specify the variational model, we require a language that enables arbitrary stacking of distributions on (parameters of) other distributions. In other words, we need a language for specifying hierarchical models.

There were several suggestions from the meeting today:

Use one of the modeling languages for the probability model to specify the variational model.
Write each distribution separately, so for example, there is a Gaussian class with attributes being its parameters and transformation (for constrained to unconstrained parameters), and with methods sample() and log_prob(). Then have wrappers that can collect the distributions in order to easily specify a mean-field Gaussian of dimension d.

The former would be preferable, but in the end we want the variational model to be written in TensorFlow (specifically, its parameters), so that we can do autodiff and also use TensorFlow's speed. So I think the latter is the right direction.

dustinvtran commented 8 years ago

@mariru mentioned something about storing just one really large tf.Variable(). Then when variational families use it for various things (sampling, log prob evaluation), the variational family will 1. extract the necessary parameters; 2. call the method with those parameters. For example, a Gaussian variational family will store the mean parameters as the first half and std dev parameters as the second half.

This is useful for factorizations which have a combination of families (e.g., a mean-field family of a Gaussian mixture model has mean-field Gaussian, Dirichlet, and inverse Gamma). It's also useful for variational auto-encoders: the output of the neural network is a vector that we extract variational parameters from.

mariru commented 8 years ago

One idea would be to have the user specify their variational family using a list of triplets. [("name1", size1, type1), ("name2", size2, type2), ...]

The variational class then creates the tensorflow object for the variational parameters z = tf.Variable() of length flatten(size1)flatten(size2)... and uses the types to choose which transformations apply to each component of these parameters. The number of variational parameters will also depend on the types because a Gaussian latent variable has for example 2 variational parameters.

Sampling from the variational object should then return a dictionary of the latent variables in the correct shape: def sample(self,size,sess): z1 = ... z2 = ... return {"name1": z1.reshape(size1), "name2": z2.reshape(size2), ...}

Note that we want this returned dictionary to be created automatically depending on the list the user provided to instantiate the variational class.

This abstraction can help with the following two things:

A user can easily distinguish between the different roles the latent variables play in model.log_prob() and does not have to slice zs.
If you want to do different inference procedures for different latent variables, you can provide a list of names to each optimizer for the latent variables you want to have optimized.

On Mon, Mar 14, 2016 at 2:58 PM Dustin Tran notifications@github.com wrote:

@mariru https://github.com/mariru mentioned something about storing just one really large tf.Variable(). Then when variational families use it for various things (sampling, log prob evaluation), the variational family will 1. extract the necessary parameters; 2. call the method with those parameters. For example, a Gaussian variational family will store the mean parameters as the first half and std dev parameters as the second half.

This is useful for factorizations which have a combination of families (e.g., a mean-field family of a Gaussian mixture model has mean-field Gaussian, Dirichlet, and inverse Gamma). It's also useful for variational auto-encoders: the output of the neural network is a vector that we extract variational parameters from.

— Reply to this email directly or view it on GitHub https://github.com/Blei-Lab/blackbox/issues/18#issuecomment-196472505.

blei-lab / edward

design base class for variational models #18