This project was implemented using the official PyTorch implementation by "jaywalnut310". After training for 10 epochs (32 batch size, 460k steps), the inference results for two random male and female speakers are available in the inference_samples.
Setting up the development environment for this project can be challenging due to version conflicts with various libraries.
Therefore, we managed the development environment of this project using a Docker container.
The Docker image, which can be used to create the Docker container, can be downloaded from Docker Hub.
The data used for model training can be downloaded from the following link.
데이터 세트를 다운로드 한 뒤 학습할 수 있게 전처리를 해야 합니다.
You can clone this GitHub repository and use it.
git clone https://github.com/0913ktg/vits_korean_multispeaker
You can download the model checkpoints and filelists from the Google Drive link.
Once you have 22kHz audio files, train, and validation filelists, and have completed data preprocessing, you can start training by running train_ms.py. Multi-GPU usage has been confirmed.