ex3ndr / supervoice-vall-e-2

VALL-E 2 reproduction
72 stars 11 forks source link

✨ Supervoice VALL-E 2

Feel free to join my Discord Server to discuss this model!

An independent VALL-E 2 reproduction for voice synthesis with voice cloning.

https://github.com/user-attachments/assets/484362c5-7397-48f3-88df-b881ee491571

Features

Tips and tricks

Architecture

Repdorduction tries to follow papers as close as possible, but some minor changes include

valle-2 arcitecture

How to use

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model
model = torch.hub.load(repo_or_dir='ex3ndr/supervoice-vall-e-2', model='supervoice')
model = model.to(device)

# Synthesize
in_voice_1 = model.synthesize("voice_1", "What time is it, Steve?", top_p = 0.2).cpu()
in_voice_2 = model.synthesize("voice_2", "What time is it, Steve?", top_p = 0.2).cpu()

# Experimental voices
in_emo_1 = model.synthesize("emo_1", "What time is it, Steve?", top_p = 0.2).cpu()
in_emo_2 = model.synthesize("emo_2", "What time is it, Steve?", top_p = 0.2).cpu()

License

MIT