🎧 Audio Samples. $\quad\quad$ 🤗 Play Online.
A simple VC framework.
(a) Training | (b) Inference |
---|---|
(c) Training (w/ optional properties) | (d) Inference (w/ optional properties) |
git clone https://github.com/OlaWod/PitchVC.git
cd PitchVC
pip install -r requirements.txt
Files on demand:
exp/default/g_00700000
)src1.wav
) and target wavs&embs (e.g. p244_008.wav
&p244_008.npy
) in convert.txt
Utils/JDC/bst.t7
speakerlab/pretrained/speech_eres2net_sv_en_voxceleb_16k/pretrained_eres2net.ckpt
and speakerlab/pretrained/speech_eres2net_sv_zh-cn_16k-common/pretrained_eres2net_aug.ckpt
# single process
CUDA_VISIBLE_DEVICES=0 python convert_sp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test
# single process; finetune input f0 automatically
CUDA_VISIBLE_DEVICES=0 python convert_sp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test --search
# multi process
CUDA_VISIBLE_DEVICES=0 python convert_mp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test --n_processes 6
# multi process; finetune input f0 automatically
CUDA_VISIBLE_DEVICES=0 python convert_mp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath convert.txt --outdir outputs/test --n_processes 6 --search
convert.txt:
{title}|{source_wav_path}|{target_spk_reference_wav_path}|{target_spk_id}|{target_spk_reference_embedding_path}
e.g.
title1|src1.wav|dataset/audio/p244/p244_008.wav|p244|dataset/spk/p244/p244_008.npy
Files on demand:
speaker_encoder/ckpt/pretrained_bak_5805000.pt
Utils/JDC/bst.t7
Preprocess:
export PYTHONPATH=.
python preprocess/1_downsample.py --in_dir </path/to/VCTK/wavs> # dataset/vctk-16k/{spk}/{xx}.wav
python preprocess/2_get_flist.py # filelists/{situation}.txt
python preprocess/3_get_spk2id.py # filelists/spk2id.json
python preprocess/4_get_spk_emb.py # dataset/spk/{spk}/{xx}.npy
python preprocess/5_get_spk_emb_best.py # filelists/spk_stats.json
python preprocess/6_get_f0.py # dataset/f0/{spk}/{xx}.pt
python preprocess/7_get_f0_stats.py # filelists/f0_stats.json
cd dataset
ln -s vctk-16k audio
cd ..
Training:
CUDA_VISIBLE_DEVICES=0 python train.py --config config_v1_16k.json --checkpoint_path exp/test
python test/1_select_tgt.py # test/TEST_TGT/{xx}.wav
python test/2_select_src.py # test/TEST_SRC_{CORPUS}/{xx}.wav
python test/3_get_txts.py # test/txts/{scenario}.txt
CUDA_VISIBLE_DEVICES=0 python convert_mp.py --hpfile config_v1_16k.json --ptfile exp/default/g_00700000 --txtpath test/txts/<scenario>.txt --outdir outputs/<scenario> --n_processes 6 --search
cd metrics/<metrics>
bash run.sh