auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
636 stars 92 forks source link

How to align multiple sequences while they are from different source? #43

Open inconnu11 opened 3 years ago

inconnu11 commented 3 years ago

If the length of content code, rhythm code and pitch code is different from each other, how do they align since there is no attention mechanism in decoder?

auspicious3000 commented 3 years ago

The rhythm code provides the alignment information. The decoder just use this information automatically to align the content code and/or pitch code.

dbkest commented 2 years ago

The rhythm code provides the alignment information. The decoder just use this information automatically to align the content code and/or pitch code.

Is it right in your code(model.py: line308-309) when content code, rhythm code and pitch code are from different utterance, since the three have alignment problem. I don't find details of the variant to get alignment information propose in your papar, Appendix B.3, can you tell me the details. thank you.