Follow these steps for installation:
Ensure that CUDA
is installed
Clone the repository: git clone https://github.com/Haurrus/xtts-trainer-no-ui-auto
Navigate into the directory: cd xtts-trainer-no-ui-auto
Create a virtual environment: python -m venv venv
Activate the virtual environment:
venv\scripts\activate
source venv\bin\activate
Install PyTorch and torchaudio with pip command :
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
Install all dependencies from requirements.txt :
pip install -r requirements.txt
This is a Python script for fine-tuning a text-to-speech (TTS) model for xTTSv2. The script utilizes custom datasets and use CUDA for accelerated training.
To use the script, you need to specify two JSON files: args.json
and datasets.json
.
This file should contain the following key parameters:
num_epochs
: Number of epochs for training, if set to 0 it will auto calculate it.batch_size
: Batch size for training.grad_acumm
: Gradient accumulation steps.max_audio_length
: max audio duration of wavs used to train.language
: language used to train the model.version
: by default main from xTTSv2json_file
: by default main from xTTSv2custom_model
: by default main from xTTSv2This file should list the datasets to be used with paths and activation flags.
To train models you need a dataset, there's an exemple dataset in the finetune_models, it's a FemaleDarkElf voice from Skyrim
Execute the script with the following command:
python xtts_finetune_no_ui_auto.py --args_json path/to/args.json --datasets_json path/to/datasets.json
This script processes audio files to create training and evaluation datasets using the Whisper model. It has been updated to include several new features and improvements.
To use the script, provide the path to a JSON configuration file and the Whisper model version as command-line arguments:
python xtts_generate_dataset.py --config path/to/config.json --whisper_version large-v3
The JSON configuration file should contain the audio paths, target language, activation flag, and name for each dataset.
The configuration file should follow this format:
[
{
"name": "dataset_name",
"audio_path": "path/to/audio/files",
"language": "en",
"activate": true
}
]
Replace path/to/audio/files
with the actual path to your audio files and dataset_name
with a preferred name for your output subdirectory.
Processing Entire Audio Files: The script has been modified to process entire audio files without splitting them into segments. Each audio file is transcribed as a whole, and the corresponding transcription is stored.
Output Directory Customization: The output directory is now named output_datasets
and is created in the root directory where the script is executed. Inside this directory, subdirectories are created based on the name
provided in the JSON configuration file.
Language Configuration: The script writes the target language to a lang.txt
file in the output directory, ensuring consistent language settings across the dataset.
Audio File Copying: All processed audio files are copied into a wavs
folder located in their respective output subdirectories.
Error Handling and Logging: The script includes error handling and logging mechanisms to provide clear feedback in case of any issues during the processing.
Configurable Through JSON: The entire preprocessing can be configured using a JSON file, making it easy to adjust settings like the target language, audio paths, and output names.
Contributions are welcome. Please fork the repository and submit pull requests with your changes.
Thanks to the author daswer123 for the repository xtts-webui , My project is based on his work.