blei-lab / edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
http://edwardlib.org
Other
4.83k stars 761 forks source link

Minibatch MCMC inference with latent state per sample? #670

Open bfredl opened 7 years ago

bfredl commented 7 years ago

https://github.com/blei-lab/edward/blob/master/examples/probabilistic_pca_subsampling.py shows how to do minibatch inference with latent state per sample, for variational inference methods. This is possible as one manages the variational parameters oneself and can use tf.gather to select the parameters for the minibatch.

However it doesn't seem as simple as i e with monte carlo methods like sghmc, at it manages velocity state and empirical results itself, and cannot account for minibatch indicies. It looks one would need extend the MonteCarlo classes to optionally take a minibatch index variable, that will be used for updates to the managed state. I might look into this myself, if you don't think some other approach would be better?

dustinvtran commented 7 years ago

Great point. The ideal solution would be automatic. That is, if you specify

T = 10000  # number of empirical samples
N = 1000000  # number of data points
M = 128  # minibatch size

qz_variables = tf.Variable(tf.random_normal([T, N]))
minibatch_index = tf.placeholder(tf.int32, M)

qz = Empirical(
  params=tf.transpose(tf.gather(tf.transpose(qz_variables), minibatch_index)))

then

inference.update(feed_dict={minibatch_index: index_batch})

would only update M dimensions of the tth parameter. Other dimensions would not be updated.

Currently, we don't do this in Monte Carlo but I hope there's a more intelligent implementation that can.