NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
855 stars 183 forks source link

Speaker order is randomised while loading - why? #68

Closed karkirowle closed 4 years ago

karkirowle commented 4 years ago

I was wondering whether this is intentional or not, but it seems the utterances from the file_list are shuffled before loading. I'm talking about these lines. It makes selecting an audio file for style transfer difficult, so I was wondering if there is a reason for this.

karkirowle commented 4 years ago

Ok, that was actually a stupid question because you are also using the loader for training. My bad. But maybe it would make more sense to make it optional with a flag during style transfer?