Blankuca / DeepLearning-VAE

0 stars 0 forks source link

Investigate distribution for output model p_theta(x|z) #4

Open Blankuca opened 3 years ago

Blankuca commented 3 years ago

Right now we are using a Normal (gaussian distribution) for modelling the reconstructions, since these are continuous.

The way this distribution is computed is as follows:

Investigate: is this correct?

jespermk commented 3 years ago

Almost correct, it is the output of the encoder, it looks like h_x contains both mu and sigma :

h_x = self.encode(x) mu, log_sigma = h_x.chunk(2, dim=-1)

I am not sure what exactly h_x.chunk(2, dim=-1) means but it looks like h_x[0]=mu, h_x[1]=sigma

Blankuca commented 3 years ago

I've already tried that, but what that function does is split the output in two. Therefore, sizes won't match.

I've tried using mu = h_x and a fixed sigma, as they describe in the text below. Then the elbo works. But of course, is not a great choice:

image

jespermk commented 3 years ago

hmmm that's not the worst idea, would also make testing the z space a lot easier as the mahalanobis distance basically just becomes a normal L distance, so if it works then I think we should do it

jespermk commented 3 years ago

what do you mean the size doesn't mach ?

Blankuca commented 3 years ago

Well, in the same paper they explicitly say its not a good idea at all :( image

I haven't read all the paper through since I'm working on other stuff, but you can check if there's some useful info: https://hal.inria.fr/hal-02497248/document

Otherwise, there should be more papers on the internet, haven't done an in-depth research.

Blankuca commented 3 years ago

what do you mean the size doesn't mach ?

Because then mu and sigma have half the size than the original size of an observation x, and the outputs x' will have half the size, and this naturally leads to errors.

jespermk commented 3 years ago

dim(z) is not the same as dim(x). but if you look at the code below it should be taken care of as the encoder outputs a vector with 2*dim(z) so that there is a (mu,sigma) for each dimension of z

Inference Network

    # Encode the observation `x` into the parameters of the posterior distribution
    # `q_\phi(z|x) = N(z | \mu(x), \sigma(x)), \mu(x),\log\sigma(x) = h_\phi(x)`
    self.encoder = nn.Sequential(
        nn.Linear(in_features=self.observation_features, out_features=256),
        nn.ReLU(),
        nn.Linear(in_features=256, out_features=128),
        nn.ReLU(),
        # A Gaussian is fully characterised by its mean \mu and variance \sigma**2
        nn.Linear(in_features=128, out_features=2*latent_features) # <- note the 2*latent_features
    )
jespermk commented 3 years ago

but I think it's just an arbitrary choice to make dim(z)=dim(x)/2 we can try other ratios as long as dim(x)>dim(z)

jespermk commented 3 years ago

have you tried: h_x = self.decoder(z)

mu, log_sigma = h_z.chunk(2, dim=-1)

    #dist = torch.distributions.normal.Normal(h_z, h_z)

    return torch.distributions.normal.Normal(mu, log_sigma)
Blankuca commented 3 years ago

Actually that gave me an idea, you're right! The decoder, instead of outputing a dimension equal to the original data, I just make it output double this dimension. So when h_z.chunk(2, dim=-1) is applied, they will have the desired size. Then we can derive mu and sigma from this.

image

Will try this out and let you know, but the elbo doesn't seem to be giving problems