beer-asr / beer

Bayesian spEEch Recognizer
MIT License
55 stars 24 forks source link

Do we need to keep the "mlpmodel" module ? #19

Closed lucasondel closed 6 years ago

lucasondel commented 6 years ago

I'm moving the talk on gitHub so we can eventually keep track of the discussion.

ibenes wrote:

1) Reparametrization. I was aware that the for loop was going to be slower, but my primary concern was that the VAE should not know how to sample, actually it should not know what the encoder produces.

Using the "encoder output".sample() was the first step. Next, I'm planning to give the NormalDiagonalCovariance_MLP.sample() a parameter equivalent to nsamples in your example, restoring the speed of your solution.

2) FixedNormal Actually, I first tried to construct a NormalDiagonalCovariance and use it as the latent model. (Apart from being complicated), I think there is something wrong with it. And until I can tell you "here is an error" -- e.g. by a test -- I wanted a model I can understand :-)

3) MLPModel in recipes Whoa! This didn't come to my mind, it may actually be cool.

4) Some unorganized thought One thing I do not like about the VAE (as it is in the master) is how much it is coupled with a Gaussian distribution coming from the encoder.

The models.normal.NormalDiagonalCovariance is IMHO wrong, here are the clues: cov() and mean() do directly evalue.view(4,-1), instead of using _normalwishart_split_nparams(). Therefore, it fails for latent dimension different from 2 and I think it does some nonsense even in the 2-dim case.

We currently have two kinds of distributions. Bayesian ones as latent models and ordinary ones coming out of encoder/decoder. Is there any chance to unite them? Also, PyTorch now has a pretty nice set of distributions. Unfortunately, I think we cannot really use it, can we?

lucasondel commented 6 years ago

Fixed normal / NormalDiagonalCovariance

I agree with you that currently the Normal model (and its variants) are not very intuitive to manipulate and we probably need to review the interface and gives some documentation so people can use it without knowing much about their internals.

Now, regarding the NormalDiagonalCovariance: Of course there is always the possibility of a bug but I'm pretty confident that it does the right thing (you can have a look at the tests and add some if you wish https://github.com/beer-asr/beer/blob/master/tests/test_expfamilyprior.py, https://github.com/beer-asr/beer/blob/master/tests/test_normal.py). Also, I cannot use the _normalwishart_split_nparams for the NormalDiagonalCovariance model as it has a normal-gamma prior/posterior and therefore its natural parameters are not organized in the same way as the NormalFullCovariance. Therefore evalue.view(4, -1) is actually the correct thing to do.

VAE encoder coupled with Gaussian distribution

That is right, and it is actually only accepting models having a Gaussian with diag. cov. matrix. There are several reasons for it:

  1. Having a Gaussian with full covariance in the subspace is of limited interest since it would be extremely slow (for a N data points and a latent subspace of D dimension, computing the log likelihood would require inverting N matrices of DxD dimension) and it preferable that the auto-encoder creates a latent space with uncorrelated dimension (what would be the interest of Probabilistic PCA with a correlated latent space ?)
  2. Regarding speech, I'm not aware of any use case where one would need something else than a Gaussian in the latent space.
  3. If we ever feel the need to have a model with Non-Gaussian latent space, it would probably easier to create a new model, say VAEDirichlet, that would fit our need rather than having an single general model overly complicated for 90 % of the use cases.

MLPModel in recipes

I really think that a nice way to go. The user can do whatever he wants with the nnet architecture and it's less code to maintain for us.

ibenes commented 6 years ago

Thank you very much for the help with NormalDiagonalCovariance! I have successfully applied it to the MNIST example, thus FixedGaussian is dead as deserved :-)

I've also eliminated the MLP model and complete encoder/decoder is now built by the user. Therefore, your residual connection in the VAE-GLC example is back in full power!

I think that mlpmodel is as cleaned as much as possible. I think that after some polishing (any nice names for the {NormalDiagonalCovariance,Bernoulli}_MLP classes?), we will be ready to merge it into the master.

I will try to persuade you about the VAE thing, but I'll start a separate issue and branch for that.

lucasondel commented 6 years ago

Great ! Thanks a lot.

Let me know when you think the branch is "polished" so we can review it and merge it. I think NormalDiagonalCovariance,Bernoulli}_MLP is fine but I would remove the underscore as it goes against the pep8.

If you think you have some better architecture/concept for the VAE I'd be happy to see it but as you told it's probably better to make a new branch for this so we don't introduce new features in the middle of code cleaning.