Era-Dorta / tf_mvg

Multivariate Gaussian distributions for Tensorflow.
MIT License
22 stars 4 forks source link

Covariance Matrix #8

Closed jhss closed 4 years ago

jhss commented 4 years ago

Dear author

Thanks for publishing the paper and sharing the code. It is difficult for me to understand some part of the paper and the code, so i would be grateful if you answer below questions.

Question 1)

In tf_mvg/examples/autoencoder_mvg_chol_filters.py, 'z', the output of the encoder is passed to the two dense layers .

def decoder_covar(z, n_h=256):

    h0 = keras.layers.Dense(n_h, activation=tf.nn.relu)(z)
    chol_half_weights = keras.layers.Dense((w * h * ((nb // 2) + 1)), activation=None)(h0)
    chol_half_weights = keras.layers.Reshape((w, h, (nb // 2) + 1))(chol_half_weights)

    # The first channel contains the log_diagonal of the cholesky matrix
    log_diag_chol_precision = keras.layers.Lambda(lambda x: x[..., 0])(chol_half_weights)
    ...

and you said that the first channel contains the log_diagonal of the cholesky matrix.

I couldn't understand well what's the purpose of the channel. I suspected that the channel is related with a sparse pattern, but i don't know exactly.

Question 2)

Suppose that a dimension of an input image is (28, 28).
Then a covariance matrix of the image is (784) x (784).

Since its dimension is too high, your model outputs a covariance matrix generated by sparse patterns, and the outputs are 'chol_precision_weights', and 'log_diag_chol_precision'.

In your paper, you said that the covariance network outputs a sparse cholesky matrix. I thought that 'log_diag_chol_precision' contains log values of the diagonal component of the covariance matrix, but I don't know what is the inside 'chol_precision_weights'

Your code generate a multivariate gaussian distribution of the input by passing 'chol_precision_weights' and 'log_diag_chol_precision' to 'mvg_dist.MultivariateNormalPrecCholFilters'

(115) mvg = mvg_dist.MultivariateNormalPrecCholFilters(loc=y_mean, weights_precision=chol_precision_weights,
                                                         filters_precision=None,
                                                         log_diag_chol_precision=log_diag_chol_precision,
                                                         sample_shape=img_shape

but I couldn't understand how the covariance matrix is generated with 'chol_precision_weights' and 'log_diag_chol_precision'.

My question seems a little bit silly, but looking forward to your answer :) Thanks.

jhss commented 4 years ago

Question 1)

The channels have Cholesky form(L) of a precision matrix.

For example, let a shape of input images be (5, 5) and filter size be 3. Then, the channels should be 9 according to the code. The first four channels(0, 1, 2, 3) correspond to upper triangular parts of L. Since L is a lower triangular matrix, all components in the upper triangular parts become zero. And subsequent channels(4, 5, 6, 7, 8) corresponds to diagonal and lower triangular components of L.

Question 2)

The precision matrix is generated as follows.

call mvg.cov_obj.chol_precision -> _build_chol_precision in PrecisionConvCholFilters class -> _build_matrix_from_basis in PrecisionConvFilters class