Two issues using VAE with multiple inputs

doctorwes commented 5 years ago

I'm trying to implement a VAE with two input vectors rather than one. The purpose is to be able to sample the first vector conditional on known values of the second. My first attempt was the following:

@inf.probmodel
def vae(k, d0, dx, dy, decoderx, decodery):
    with inf.datamodel():
        z = inf.Normal(tf.ones(k), 1,name="z")
        x = inf.Normal(decoderx(z, d0, dx), 1, name="x")
        y = inf.Deterministic(decodery(z, d0, dy), name="y")

# Neural networks for decoding and encoding
def decoderx(z, d0, dx):
    h0 = tf.keras.layers.Dense(d0, activation=tf.nn.relu)
    h1 = tf.keras.layers.Dense(dx)
    return h1(h0(z))

def decodery(z, d0, dy):
    h0 = tf.keras.layers.Dense(d0, activation=tf.nn.relu)
    h1 = tf.keras.layers.Dense(dy)
    return h1(h0(z))

def encoder(x, y, d0, k):
    hm = tf.keras.layers.Concatenate(axis=1)
    h0 = tf.keras.layers.Dense(d0, activation=tf.nn.relu)
    h1 = tf.keras.layers.Dense(2*k)
    return h1(h0(hm([x, y])))

# Q model for making inference
@inf.probmodel
def qmodel(k, d0, dx, dy, encoder):
    with inf.datamodel():
        x = inf.Normal(tf.ones(dx), 1, name="x")
        y = inf.Normal(tf.ones(dy), 1, name="y")
        print((x.shape, y.shape))
        output = encoder(x, y, d0, k)
        qz_loc = output[:, :k]
        qz_scale = tf.nn.softplus(output[:, k:]) + scale_epsilon
        qz = inf.Normal(qz_loc, qz_scale, name="z")

k = 3
d0 = 50
dx = 4
dy = 2
scale_epsilon = 1e-6

m = vae(k, d0, dx, dy, decoderx, decodery)
sample = m.prior().sample(1000)

q = qmodel(k, d0, dx, dy, encoder)
m.fit(sample, inf.inference.VI(q, epochs=10000))

This generates the following error:

(TensorShape([Dimension(1), Dimension(4)]), TensorShape([Dimension(1), Dimension(2)]))
(TensorShape([Dimension(1), Dimension(4)]), TensorShape([Dimension(2)]))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-15-48f880cfb1cd> in <module>()
----> 1 q = qmodel(k, d0, dx, dy, encoder)
      2 
      3 m.fit(sample, inf.inference.VI(q, epochs=10000))

...

/home/wkp/.conda/envs/WKPenv/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/merge.py in build(self, input_shape)
    383     shape_set = set()
    384     for i in range(len(reduced_inputs_shapes)):
--> 385       del reduced_inputs_shapes[i][self.axis]
    386       shape_set.add(tuple(reduced_inputs_shapes[i]))
    387     if len(shape_set) > 1:

IndexError: list assignment index out of range

I can eliminate the error by reshaping y in the encoder function:

output = encoder(x, tf.reshape(y, (x.shape[0], y.shape[-1])), d0, k)

However, I then get the following error:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/home/wkp/.conda/envs/WKPenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs)
   1606   try:
-> 1607     c_op = c_api.TF_FinishOperation(op_desc)
   1608   except errors.InvalidArgumentError as e:

...

ValueError: Cannot reshape a tensor with 2 elements to shape [1000,2] (2000 elements) for 'Reshape_1' (op: 'Reshape') with input shapes: [2], [2] and with input tensors computed as partial shapes: input[1] = [1000,2].

I get the same error when I try SVI, unless I specify batch_size=1 , in which case it works.

jcozar87 commented 5 years ago

Thank you @doctorwes for your report.

We have detected a bug where, in the code that you are using, a false dependency in the qmodel is detected between x and y.

We are debugging this issue and we will keep you updated with any solution and bugfix release that solves this problem.

jcozar87 commented 5 years ago

We fixed the problem in 0adb9dfc86f7bc72271e453a59b7522ded1b96d6, and it has been included in release 1.2.3, which is already available (pip install inferpy==1.2.3).

Regarding to your model, the following changes could be made:

Prior of y cannot be defined as Deterministic: when calculating the ELBO, the log_prob of such distribution will be -inf for any value different to the value in which it has been defined.
decoderx and decodery were basically the same function. You can define them in a single one. Note that every time you invoke decoder(), a new NN is created.
As we fixed the bug in the code, there is not need to do the reshaping in the q-model.
When sampling from the prior, a dictionary with samples for all the variables is returned. In this example, we only need samples for “x” and “y”.

Please tell us if this solves your problem to close this issue in the end.

doctorwes commented 5 years ago

Thank you! Inference now seems to work; here is the actual model I'm using, modified according to your advice. However, conditional sampling is not giving me the results I expect. In the training data, there are correlations between the columns of x_train and y_train.

@inf.probmodel
def vae(k, d0, dx, dy, decoder):
    with inf.datamodel():
        z = inf.Normal(tf.ones(k), 1, name="z")
        output = decoder(z, d0, dx+dy)
        x_loc = output[:, :dx]
        x_scale = tf.nn.softmax(output[:, dx:2*dx])
        y_loc = output[:, 2*dx:2*dx+dy]
        y_scale = tf.nn.softmax(output[:, 2*dx+dy:])
        x = inf.Normal(x_loc, x_scale, name="x")
        y = inf.Normal(y_loc, y_scale, name="y")

# Neural networks for decoding and encoding
def decoder(z, d0, d):
    h0 = tf.keras.layers.Dense(d0, activation=tf.nn.relu)
    h1 = tf.keras.layers.Dense(2*d)
    return h1(h0(z))

def encoder(x, y, d0, k):
    hm = tf.keras.layers.Concatenate(axis=1)
    h0 = tf.keras.layers.Dense(d0, activation=tf.nn.relu)
    h1 = tf.keras.layers.Dense(2*k)
    return h1(h0(hm([x, y])))

# Q model for making inference
@inf.probmodel
def qmodel(k, d0, dx, dy, encoder):
    with inf.datamodel():
        x = inf.Normal(tf.ones(dx), 1, name="x")
        y = inf.Normal(tf.ones(dy), 1, name="y")
        output = encoder(x, y, d0, k)
        qz_loc = output[:, :k]
        qz_scale = tf.nn.softplus(output[:, k:]) + scale_epsilon
        qz = inf.Normal(qz_loc, qz_scale, name="z")

# number of components
k = 12
# size of the hidden layer in the NN
d0 = 100
# dimensionality of the data
dx = 4
dy = 2
# number of observations (dataset size)
N = 240
# batch size
M = 12

# minimum scale
scale_epsilon = 0.01
# inference parameters
learning_rate = 0.01

m = vae(k, d0, dx, dy, decoder)
q = qmodel(k, d0, dx, dy, encoder)

# set the inference algorithm
VI = inf.inference.VI(q, epochs=10000)
m.fit({"x": x_train, "y": y_train}, VI)

Unconditional sampling recovers the expected correlations between the columns of sampled x and y.

uncond_draws = m.posterior_predictive().sample(1000)

However, when I attempt conditional sampling in the following way (i.e. when I try to sample conditional on predetermined values of y), the columns of sampled x and of y_data are pretty much uncorrelated.

cond_draws = m.posterior_predictive(data={'y':y_obs}).sample(1000)

I'm not sure whether this is because my model is mis-specified, or because I'm defining/invoking the Query object in the wrong way.

Anyway, many thanks for your assistance - your response was remarkably rapid!

andresmasegosa commented 5 years ago

I would recommend you to simplify your model by not modelling the variance/scale of the Normal distributions of 'x' and 'y'. Simply learn the means and keep the variances to a constant (e.g. scale=1.0).

This is an issue with VAEs using Gaussian observation distributions. The model does not know whether to increase variance or move the mean to capture the data, and it can take a while until convergence. This is because many people use a Binomial observation distribution when building VAEs.

rcabanasdepaz commented 5 years ago

Regarding to the issue with the uncorrelated samples of x and y. The problem is that, in your P model, x and y are independent given z. This means that fixing the values of y will not influence the samples of x (and viceversa).

Moreover, consider the way posterior_predictive works: samples are generated from a P model where global hidden variables and NN parameters are fixed to those inferred. Variables at input parameter data might be fixed as well. In your model, the decoder parameters and value of y are fixed. Samples are generated from parents to children, that is: first we sample from z, then those samples are passed through the NN and finally samples from x are generated. As you see the value set to y does not affect to z and hence to z.

In conclusion, you might need to consider an alternative model.

doctorwes commented 5 years ago

Thank you for your advice! Using an alternative model with a dependency of x on y seems to solve the problem.

PGM-Lab / InferPy

Two issues using VAE with multiple inputs #195