How to upsamle phoneme embedding (i.e., duration prediction) for semantics tokens?

PolyAI-LDN / pheme

Creative Commons Attribution 4.0 International

252 stars 23 forks source link

How to upsamle phoneme embedding (i.e., duration prediction) for semantics tokens? #23

Closed Jiaxin-Ye closed 1 month ago

Jiaxin-Ye commented 1 month ago

Hi! Thank you for your awesome work! I am a freshman on TTS, and I don't see any text-speech alignment method on this project. I wonder whether the T5 model can automatically upsample the semantics token?