NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
889 stars 177 forks source link

Difference between flowtron and hierarchical generative GM-VAE by google #59

Closed artificertxj1 closed 4 years ago

artificertxj1 commented 4 years ago

Hi guys, First, thanks for the great works. My background is from computer vision and am not really familiar with sequential data deep learning and Tacotron details. My major question when I read both papers, is what is the major difference between the two models?
Can I have some hints on that? Thanks. Regards, Justin Tian

rafaelvalle commented 4 years ago

Hey Justin,

Take a look at this document (https://arxiv.org/pdf/1908.09257.pdf) for a general comparison between normalizing flows and vaes.

For specific comparisons, take a look at our Flowtron paper(https://arxiv.org/pdf/2005.05957.pdf).

Generally speaking, normalizing flow's have an objective function that makes training simpler and more stable. In addition, by having a latent space with same dimensionality as the data, normalizing flows can store more information than vaes. This allows us to perform manipulations over time that are not possible in the conventional vae setup.