NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

Fixed CMUDict and ARPAbet conversion (p_arpabet is currently not used) #27

Closed xDuck closed 4 years ago

xDuck commented 4 years ago

p_arpabet was never used when generating sequences. Also added a little bit more robustness to CMUDict to be spacing-agnostic.

I also would recommend a consistent EOS symbol at the end of every generated sequence but did not include that in this PR. I can make another for that if needed, its a super easy change but would break model compatibility.

xDuck commented 4 years ago

https://github.com/NVIDIA/mellotron/commit/60488fd8b2229bf8e14b964a525d8d86d13d0954 Includes the required changes