Training on multiple languages for multiple speakers

First of all, thank you for releasing the code and making this fantastic paper!

I have read the instructions and all the issues and after that, I started to train the model. I used an Ljs dataset and a single male Russian speaker with about 40h of good clean speech. My steps were:

Add Russian cmu dictionary and update symbols, acronyms, cleaners and updated n_text to 279 and n_speakers to 2;
Train from scratch with n_flow=1 for about 500k steps

Warm start with n_flow=2 and include_layers is None for next 300k steps

Even I got promising graphics the result of the audio is awful because it sounds like another language and you can't understand any word at all. I have a few questions:

Is it possible to train flowtron for two languages simultaneously?
If so do I need to use the Russian cmu dictionary or it is better to go without arpabet?
Any thoughts on why I'm getting completely illegible speech even I used sentences from training and different frames, sigma, gate parameters?

Thank you in advance!

NVIDIA / flowtron

Training on multiple languages for multiple speakers #84