FireRedTeam / FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System
https://fireredteam.github.io/demos/firered_tts/
172 stars 8 forks source link
diffusion flow-matching gan language-model speech speech-synthesis text-to-speech tts voice-clone

FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

👉🏻 FireRedTTS Paper 👈🏻

👉🏻 FireRedTTS Demos 👈🏻

👉🏻 FireRedTTS Space (Interactive Demo) 👈🏻

News

Roadmap

Usage

Clone and install

https://github.com/FireRedTeam/FireRedTTS.git
cd FireRedTTS
# step1.create env
conda create --name redtts python=3.10

# stpe2.install torch (pytorch should match the cuda-version on your machine)
# CUDA 11.8
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
# CUDA 12.1
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

# step3.install fireredtts form source
pip install -e . 

# step4.install other requirements
pip install -r requirements.txt

Download models

Download the required model files from Model_Lists and place them in the folder pretrained_models

Basic Usage

import os
import torchaudio
from fireredtts.fireredtts import FireRedTTS

tts = FireRedTTS(
    config_path="configs/config_24k.json",
    pretrained_path=<pretrained_models_dir>,
)

#same language
rec_wavs = tts.synthesize(
        prompt_wav="examples/prompt_1.wav",
        text="小红书,是中国大陆的网络购物和社交平台,成立于二零一三年六月。",
        lang="zh",
)

rec_wavs = rec_wavs.detach().cpu()
out_wav_path = os.path.join("./example.wav")
torchaudio.save(out_wav_path, rec_wavs, 24000)

Tips

Acknowledgements