Feel free to join my Discord Server to discuss this model!
An independent VALL-E 2 reproduction for voice synthesis with voice cloning.
https://github.com/user-attachments/assets/484362c5-7397-48f3-88df-b881ee491571
Repdorduction tries to follow papers as close as possible, but some minor changes include
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model
model = torch.hub.load(repo_or_dir='ex3ndr/supervoice-vall-e-2', model='supervoice')
model = model.to(device)
# Synthesize
in_voice_1 = model.synthesize("voice_1", "What time is it, Steve?", top_p = 0.2).cpu()
in_voice_2 = model.synthesize("voice_2", "What time is it, Steve?", top_p = 0.2).cpu()
# Experimental voices
in_emo_1 = model.synthesize("emo_1", "What time is it, Steve?", top_p = 0.2).cpu()
in_emo_2 = model.synthesize("emo_2", "What time is it, Steve?", top_p = 0.2).cpu()
MIT