Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
MIT License
7.58k stars 756 forks source link

How to change 'SAMPLE_RATE' variable (Audio quality) #169

Open LeXwDeX opened 6 months ago

LeXwDeX commented 6 months ago

The current audio quality is relatively poor. I tried using my own voice to generate the voice acting for a game character but found the sampling rate to be quite low.

I found the SAMPLE_RATE = 24000 option in the project and changed it to 48000, but then the output sound became very strange.

So, is there a better way to output higher quality audio?

Also, is it possible for this project to use an XML format audio markup structure to better determine things like tone of voice? (Or how to use other func)

Thank you in advance for your help!

The audio I recorded is at a 48000Hz sampling rate, in WAV format, and under 10 seconds.

this is demo code:

# No input text
from utils.prompt_making import make_prompt

make_prompt(name="hailun", audio_prompt_path="test_input/hailun.wav",
            transcript="Watch out for the enemy over there!"
            )

# Clone
from utils.generation import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav

# download model
preload_models()

text_prompt = """
Report!
The tower ahead has been breached!
"""
audio_array = generate_audio(text_prompt, prompt="hailun")

write_wav("./test_output/hailun_clone.wav", SAMPLE_RATE, audio_array)