auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
636 stars 92 forks source link

Could you please describe details of rhythm-only conversion ? #56

Open dbkest opened 2 years ago

dbkest commented 2 years ago

I don't understand how to get alignment when the input(utterance) to the rhythm-encoder is different from inputs(utterance) to pitch/content-encoders. ps(I don't understand the implementation details of variant in Appendix B.3). thank you, sincerely.

auspicious3000 commented 2 years ago

you can understand it by reading the code for pitch conversion