Open Lenos500 opened 11 months ago
This is not supported in VALL-E architecture. Perhaps you need to train a new model.
Why not just use Whisper to transcribe and then translate in the middle?
Can you tell me how to do it in details? Like step by step please?
Hello, I'm planning to make vall ex accept input audio of any language and clone that audio into English for example. However, I'm facing the restrictions that the input audio should be in English language in case I want to clone a voice in English.
Any ideas of where should I start in case I want input audio of any language to be accepted?