dhgrs / chainer-VQ-VAE

A Chainer implementation of VQ-VAE.
82 stars 19 forks source link
chainer voice-conversion vq-vae wavenet

chainer-VQ-VAE

A Chainer implementation of VQ-VAE( https://arxiv.org/abs/1711.00937 ).

Results

Trained about 63 hours with one 1080Ti (150000 iterations) on VCTK-Corpus. You can download pretrained model from here.

Losses:

loss1 loss2 loss3

Audios:

demo

Requirements

I trained and generated with

And now you can try it on Google Colaboratory. You don't need install chainer/librosa in your local or buy GPUs. Check this.

Usage

download dataset

You can download VCTK-Corpus(en) from here. And you can download CMU-ARCTIC(en)/voice-statistics-corpus(ja) very easily via my repository.

set parameters

parameters of training

parameters of dataset

parameters of preprocessing

parameters of VQ

parameters of Decoder(WaveNet)

parameters of losses

parameters of losses

training

(without GPU)
python train.py

(with GPU #n)
python train.py -g n

If you want to use multi GPUs, you can add IDs like below.

python train.py -g 0 1 2

You can resume snapshot and restart training like below.

python train.py -r snapshot_iter_100000

Other arguments -f and -p are parameters for multiprocess in preprocessing. -f means the number of prefetch and -p means the number of processes.

generating

python generate.py -i <input file> -o <output file> -m <trained model> -s <speaker>

If you don't set -o, default file name result.wav is used. If you don't set -s, the speaker is same as input file that got from filepath.

TODO