The retrieval based voice generation text to speech system is a python based text to speech that relies on two core parts. to be able to generate speech, It relies on tacotron to convert the text to speech and then uses rvc voice conversion to be able to make it sound like any character without the need to use an audio file.
This tts has been tested on python 3.10 although might work on other versions.
You are required to have the latest 64 bit Espeak NG release.
In order to build the fairseq dependency, you are required to have Visual Studio and install the "Desktop development with C++" development package.
To use it, install poetry and install the requirements with poetry install --no-root
and then download the Hubert model, Forward Tacotron model and any RVC model.
You can then place them into the model folder with the corresponding names:
hubert_base.pt
-> hubert.pt
forward_steps90k.pt
-> forward.pt
(rvc .pth model name)
-> rvc_model.pth
(rvc .index model name)
-> rvc_index.index
(optional)Once you have all of these, you can run the RVG.py
file with your desired arguments over CLI, run the file without any arguments to launch the Gradio WebUI or you can include this code in your own project and import the rvg_tts
function from RVG.py
.
In order to use a different language, a new forward tacotron model must be trained. This is something I cannot do without a dataset. This is where I ask the community for help. If you can provide a dataset, please do.
Forward Tacotron is licensed under the MIT License
RVC Webui is licensed under the MIT License
Copyright 2023 Foxify52
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.