I'm trying to train RVC models for a range of dialects in my country. I live in South Africa.
I'm struggling with click-consonant phonemes.
These African language sounds are not understood as phonemic by the model, so transforming voice-to-voice using recordings of African language speakers results in dropped phonemes or incorrect phonemes.
Click consonants are not reliably preserved in the output and are even occasionally replaced by other consonants ("c", "ck", "k").
Is there anyway for me to overcome this roadblock? Do I need to include more of the click-consonants in my training data?
Hi!
I'm trying to train RVC models for a range of dialects in my country. I live in South Africa.
I'm struggling with click-consonant phonemes.
These African language sounds are not understood as phonemic by the model, so transforming voice-to-voice using recordings of African language speakers results in dropped phonemes or incorrect phonemes.
Click consonants are not reliably preserved in the output and are even occasionally replaced by other consonants ("c", "ck", "k").
Is there anyway for me to overcome this roadblock? Do I need to include more of the click-consonants in my training data?
Click Consonant Transformation.zip