English to another language synthesis

DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Apache License 2.0

1.47k stars 166 forks source link

English to another language synthesis #183

Closed Vicopem01 closed 3 months ago

Vicopem01 commented 4 months ago

The voice cloning and prosody cloning are amazing. But i want to clone the prosody but synthesize speech in another language. Not having any luck so far, any help?

I noticed the models only accepts both the reference audio and text in the same language, but is it possible to use english as reference and spanish as text while specifying spa as langauge, just to clone the english prosody and transfer style

Flux9665 commented 4 months ago

There should be no dependence between the language and the speaker, you can mix them. I made a new release this morning with new models, the new version is better at this, but the old version also should have been able to do that.

Vicopem01 commented 3 months ago

can the pretrained model do this? or do i just fine tine or train a model instead?

also, i cannot get the utterance cloner to run after the update

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
    302                             weight, bias, self.stride,
    303                             _single(0), self.dilation, self.groups)
--> 304         return F.conv1d(input, weight, bias, self.stride,
    305                         self.padding, self.dilation, self.groups)
    306 

RuntimeError: Given groups=1, weight of size [384, 1, 1], expected input[1, 34, 1] to have 1 channels, but got 34 channels instead

Flux9665 commented 3 months ago

Yes, the pretrained model is trained specifically to be good at switching between languages. Finetuning can however help a lot for some languages.

I will take a look at the prosody cloning when I get back from vacation, it's possible the last release broke something, I didn't have time to test the prosody cloning properly.

Flux9665 commented 3 months ago

There was indeed a bug in the prosody cloning, the dimensions of the tensor were not in the right order. I fixed it, it should work now.