Open adrianastan opened 4 years ago
Train a model with 1 step of flow first. Then use this model to warm-start a model with 2 steps of flow.
Hi,
Thanks for your reply. I indeed started training a 1-flow using the LibriSpeech train-clean-100 data using a modified unconditioned version of Flowtron. I then used the trained flow to warm-start a 2-flow architecture. However at inference there is nothing but noise: https://drive.google.com/file/d/1V7sX3Ma3RFBo6lNSCUxSsNjP3Y_HmAZo/view?usp=sharing.
I was expecting at least some babble noise.
Any hints on when is a goot point to start the second-flow training? Should I train more? Should I lower the learning rate? Below are the loss curves for the 1st flow:
Thanks!
The validation loss for your 1-step of flow model is starting to plateau. Use this model to warm-start a 2-steps of flow model. I assume the validation loss will go down. You can alternatively try the same experiment on LJS.
I warmstarted a 2 flow model from the 1 flow weights and continued training. Training and validation losses are as below:
Still no speech-like output at inference. https://drive.google.com/file/d/19OC2cSfPgfvrS0mrRx73bkLLKp0yt0v8/view?usp=sharing
I additionally started a subsequent 3 flow model, as well:
The output is as follows:
https://drive.google.com/file/d/1F7lXcEqx5_gqMDog4KgyahfKDGx7-175/view?usp=sharing
So I assume that this architecture might not be complex enough to estimate a multispeaker latent space. I will try to do the same thing on LJSpeech -- perhaps the conditions are simpler.
Thanks!
@adrianastan if you trained a model with speaker embeddings, what happens if do this:
flowtron.infer(flowtron.forward(audio, speaker), other_speaker)
I did not use speaker embeddings, just a multispeaker dataset. I removed all conditionings of the flow.
Hi,
Did anybody try to train the Flowtron flow architecture in an unconditioned manner, for density estimation for example? If so, any hints and tips you could share?
Thanks!