Open bfredl opened 7 years ago
Great point. The ideal solution would be automatic. That is, if you specify
T = 10000 # number of empirical samples
N = 1000000 # number of data points
M = 128 # minibatch size
qz_variables = tf.Variable(tf.random_normal([T, N]))
minibatch_index = tf.placeholder(tf.int32, M)
qz = Empirical(
params=tf.transpose(tf.gather(tf.transpose(qz_variables), minibatch_index)))
then
inference.update(feed_dict={minibatch_index: index_batch})
would only update M dimensions of the t
th parameter. Other dimensions would not be updated.
Currently, we don't do this in Monte Carlo but I hope there's a more intelligent implementation that can.
https://github.com/blei-lab/edward/blob/master/examples/probabilistic_pca_subsampling.py shows how to do minibatch inference with latent state per sample, for variational inference methods. This is possible as one manages the variational parameters oneself and can use
tf.gather
to select the parameters for the minibatch.However it doesn't seem as simple as i e with monte carlo methods like sghmc, at it manages velocity state and empirical results itself, and cannot account for minibatch indicies. It looks one would need extend the MonteCarlo classes to optionally take a minibatch index variable, that will be used for updates to the managed state. I might look into this myself, if you don't think some other approach would be better?