Open Blankuca opened 3 years ago
Almost correct, it is the output of the encoder, it looks like h_x contains both mu and sigma :
h_x = self.encode(x) mu, log_sigma = h_x.chunk(2, dim=-1)
I am not sure what exactly h_x.chunk(2, dim=-1) means but it looks like h_x[0]=mu, h_x[1]=sigma
I've already tried that, but what that function does is split the output in two. Therefore, sizes won't match.
I've tried using mu = h_x and a fixed sigma, as they describe in the text below. Then the elbo works. But of course, is not a great choice:
hmmm that's not the worst idea, would also make testing the z space a lot easier as the mahalanobis distance basically just becomes a normal L distance, so if it works then I think we should do it
what do you mean the size doesn't mach ?
Well, in the same paper they explicitly say its not a good idea at all :(
I haven't read all the paper through since I'm working on other stuff, but you can check if there's some useful info: https://hal.inria.fr/hal-02497248/document
Otherwise, there should be more papers on the internet, haven't done an in-depth research.
what do you mean the size doesn't mach ?
Because then mu and sigma have half the size than the original size of an observation x, and the outputs x' will have half the size, and this naturally leads to errors.
dim(z) is not the same as dim(x). but if you look at the code below it should be taken care of as the encoder outputs a vector with 2*dim(z) so that there is a (mu,sigma) for each dimension of z
# Encode the observation `x` into the parameters of the posterior distribution
# `q_\phi(z|x) = N(z | \mu(x), \sigma(x)), \mu(x),\log\sigma(x) = h_\phi(x)`
self.encoder = nn.Sequential(
nn.Linear(in_features=self.observation_features, out_features=256),
nn.ReLU(),
nn.Linear(in_features=256, out_features=128),
nn.ReLU(),
# A Gaussian is fully characterised by its mean \mu and variance \sigma**2
nn.Linear(in_features=128, out_features=2*latent_features) # <- note the 2*latent_features
)
but I think it's just an arbitrary choice to make dim(z)=dim(x)/2 we can try other ratios as long as dim(x)>dim(z)
have you tried: h_x = self.decoder(z)
#dist = torch.distributions.normal.Normal(h_z, h_z)
return torch.distributions.normal.Normal(mu, log_sigma)
Actually that gave me an idea, you're right! The decoder, instead of outputing a dimension equal to the original data, I just make it output double this dimension. So when h_z.chunk(2, dim=-1) is applied, they will have the desired size. Then we can derive mu and sigma from this.
Will try this out and let you know, but the elbo doesn't seem to be giving problems
Right now we are using a Normal (gaussian distribution) for modelling the reconstructions, since these are continuous.
The way this distribution is computed is as follows:
Let hx be the output of the decoder which does z --> x'. h_x has the same dimensions as the original x.
The normal distribution is inferred from mu = hx, sigma = hx.
Investigate: is this correct?