bshall / urhythmic

Unsupervised Rhythm Modeling for Voice Conversion
https://bshall.github.io/urhythmic/
MIT License
79 stars 7 forks source link

Training data of the segmenter #8

Open unilight opened 8 months ago

unilight commented 8 months ago

Hi @bshall, thanks for the great work!

I am wondering what is the training data of the pre-trained segmenter, as it is not described in the paper or this repo.

Thanks!

bshall commented 7 months ago

Hi @unilight, thanks for the feedback! Sorry about the delay getting back to you. The pre-trained segmenter was trained on p225. But I've tested it out on the other speakers and the learned mapping from clusters to sonorants, obstruents, and silences seems to be consistent. Let me know if you've found something different.

unilight commented 7 months ago

@bshall Thank you for the reply! I do find that segmenter works well on other speakers. I am just wondering how you found the mapping from cluster index to sonorants, obstruents, and silences. Do I need to manually find the correspondence by looking at the phoneme labels and the cluster indices? Asking this because I want to apply the same method to some atypical speech (say, accented speech). I found that the pre-trained segmenter works well on normal speech but has some problems with such atypical speech, so I wonder whether it is possible to train my own segmenter.