Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
MIT License
7.59k stars 756 forks source link

Using input audio of any language. #115

Open Lenos500 opened 11 months ago

Lenos500 commented 11 months ago

Hello, I'm planning to make vall ex accept input audio of any language and clone that audio into English for example. However, I'm facing the restrictions that the input audio should be in English language in case I want to clone a voice in English.

Any ideas of where should I start in case I want input audio of any language to be accepted?

Plachtaa commented 11 months ago

This is not supported in VALL-E architecture. Perhaps you need to train a new model.

Lenos500 commented 11 months ago

Why not just use Whisper to transcribe and then translate in the middle?

Can you tell me how to do it in details? Like step by step please?