Closed lucasondel closed 6 years ago
I agree with you that currently the Normal model (and its variants) are not very intuitive to manipulate and we probably need to review the interface and gives some documentation so people can use it without knowing much about their internals.
Now, regarding the NormalDiagonalCovariance: Of course there is always the possibility of a bug but I'm pretty confident that it does the right thing (you can have a look at the tests and add some if you wish https://github.com/beer-asr/beer/blob/master/tests/test_expfamilyprior.py, https://github.com/beer-asr/beer/blob/master/tests/test_normal.py). Also, I cannot use the _normalwishart_split_nparams
for the NormalDiagonalCovariance
model as it has a normal-gamma prior/posterior and therefore its natural parameters are not organized in the same way as the NormalFullCovariance
. Therefore evalue.view(4, -1)
is actually the correct thing to do.
That is right, and it is actually only accepting models having a Gaussian with diag. cov. matrix. There are several reasons for it:
I really think that a nice way to go. The user can do whatever he wants with the nnet architecture and it's less code to maintain for us.
Thank you very much for the help with NormalDiagonalCovariance! I have successfully applied it to the MNIST example, thus FixedGaussian is dead as deserved :-)
I've also eliminated the MLP model and complete encoder/decoder is now built by the user. Therefore, your residual connection in the VAE-GLC example is back in full power!
I think that mlpmodel is as cleaned as much as possible. I think that after some polishing (any nice names for the {NormalDiagonalCovariance,Bernoulli}_MLP classes?), we will be ready to merge it into the master.
I will try to persuade you about the VAE thing, but I'll start a separate issue and branch for that.
Great ! Thanks a lot.
Let me know when you think the branch is "polished" so we can review it and merge it. I think NormalDiagonalCovariance,Bernoulli}_MLP is fine but I would remove the underscore as it goes against the pep8.
If you think you have some better architecture/concept for the VAE I'd be happy to see it but as you told it's probably better to make a new branch for this so we don't introduce new features in the middle of code cleaning.
I'm moving the talk on gitHub so we can eventually keep track of the discussion.
ibenes wrote:
1) Reparametrization. I was aware that the for loop was going to be slower, but my primary concern was that the VAE should not know how to sample, actually it should not know what the encoder produces.
Using the "encoder output".sample() was the first step. Next, I'm planning to give the NormalDiagonalCovariance_MLP.sample() a parameter equivalent to nsamples in your example, restoring the speed of your solution.
2) FixedNormal Actually, I first tried to construct a NormalDiagonalCovariance and use it as the latent model. (Apart from being complicated), I think there is something wrong with it. And until I can tell you "here is an error" -- e.g. by a test -- I wanted a model I can understand :-)
3) MLPModel in recipes Whoa! This didn't come to my mind, it may actually be cool.
4) Some unorganized thought One thing I do not like about the VAE (as it is in the master) is how much it is coupled with a Gaussian distribution coming from the encoder.
The models.normal.NormalDiagonalCovariance is IMHO wrong, here are the clues: cov() and mean() do directly evalue.view(4,-1), instead of using _normalwishart_split_nparams(). Therefore, it fails for latent dimension different from 2 and I think it does some nonsense even in the 2-dim case.
We currently have two kinds of distributions. Bayesian ones as latent models and ordinary ones coming out of encoder/decoder. Is there any chance to unite them? Also, PyTorch now has a pretty nice set of distributions. Unfortunately, I think we cannot really use it, can we?