blei-lab / edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
http://edwardlib.org
Other
4.83k stars 760 forks source link

random variables and tf.gather (or tensor slice/join operations). #615

Open davidlibland opened 7 years ago

davidlibland commented 7 years ago

When you apply tf.gather to a random variable, edward seems to have trouble treating the result as a random variable. Here's an example: if W is a random variable of shape (100), then tf.gather(W,np.arange(100)) should be effectively the same as W, but substituting one for the other yields different results. Here's a simple example in code:

setup:

# DATA
N=30
p0 = 0.2
x_data = np.random.choice([0,1],N,p=[1-p0,p0])

# MODEL
p = Beta(a=1.0, b=1.0)
x = Bernoulli(p=tf.ones(N) * p)
x_equiv = tf.gather(x,np.arange(N))

# INFERENCE
qp_a = tf.nn.softplus(tf.Variable(tf.random_normal([])))
qp_b = tf.nn.softplus(tf.Variable(tf.random_normal([])))
qp = Beta(a=qp_a, b=qp_b)

Now if we run the following inference:

inference = ed.KLqp({p: qp}, data={x: x_data})
inference.run(n_iter=500)

we get the following distribution for qp image

however, if we run

inference = ed.KLqp({p: qp}, data={x_equiv: x_data})
inference.run(n_iter=500)

we get image

dustinvtran commented 7 years ago

tf.gather returns a tf.Tensor and not a ed.RandomVariable. This means x and x_equiv are different:

>>> x
<ed.RandomVariable 'Bernoulli/' shape=(30,) dtype=int32>
>>> x_equiv
<tf.Tensor 'Gather:0' shape=(30,) dtype=int32>

As with all TensorFlow ops, tf.gather is performed on the random variable's associated tensor. This implies the second inference is conditioning on data that doesn't affect the latent variables at all; this is why qp is shown converging to the prior.

Ideally, we might determine the distribution of certain TensorFlow op outputs such as tf.gather. For tf.gather, an open problem is how to determine the individual parameters that make up a batch of random variables. Say, the first Bernoulli random variable's parameter within a vector of them.

davidlibland commented 7 years ago

@dustinvtran I'm confused by:

Ideally, we might determine the distribution of certain TensorFlow op outputs such as tf.gather. For tf.gather, an open problem is how to determine the individual parameters that make up a batch of random variables. Say, the first Bernoulli random variable's parameter within a vector of them.

In essence tf.gather(A,B) seems to be a version of a tensor product between A and some tensor determined by B (for example, when B=np.arange(N), the second tensor is just the identity matrix). So this doesn't seem any more complicated than the implementation of ed.dot; at least for the case where B is not stochastic.

dustinvtran commented 7 years ago

I'm not sure if that solves the problem though. ed.dot returns a tf.Tensor and not a ed.RandomVariable. To put it another way, tf.gather demonstrates the fundamental problem; as you point out, ed.dot is even more challenging.

olegsinavski commented 7 years ago

@dustinvtran @davidlibland Is there any workaround for tf.gather? For example, this tutorial http://edwardlib.org/api/inference-compositionality seems to be apply tf.gather on a vector of rvs:

beta = Normal(loc=tf.zeros([K, D]), scale=tf.ones([K, D]))
z1 = Categorical(logits=tf.zeros([N1, K]))
z2 = Categorical(logits=tf.zeros([N2, K]))
x1 = Normal(loc=tf.gather(beta, z1), scale=tf.ones([N1, D]))
x2 = Normal(loc=tf.gather(beta, z2), scale=tf.ones([N2, D]))

Is this a valid model? In fact, it is even 'worse' since both arguments in gather are rvs (as opposed to just values).

olegsinavski commented 7 years ago

Related question: does tf.gather behave well if I use MAP inference? In this case I do not expect a difference between rv and its tensor.

dustinvtran commented 7 years ago

In this setting, tf.gather is fine because these are just ways of parameterizing a distribution. During inference we're not asking about the distribution of the tf.gatherd random variable, but z1, z2, and beta.