Unsupervised Any-to-many Audiovisual Synthesis via Exemplar Autoencoders
Kangle Deng, Aayush Bansal, Deva Ramanan
project page / demo / arXiv
This repo provides a PyTorch Implementation of our work.
Acknowledgements: This code borrows heavily from Auto-VC and Tacotron.
First, make sure ffmpeg installed on your machine.
Then, run: pip install -r requirements.txt
We provide our CelebAudio Dataset at link.
Check 'scripts/train_audio.sh' for an example of training a Voice-Conversion model. Make sure directory 'logs' exist.
Generally, run:
python train_audio.py --data_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL
You can specify any audio data as PATH_TO_TRAINING_DATA, and a small clip of audio as PATH_TO_TEST_AUDIO. For example, the following script trains an audio model for Barack Obama, and use an input clip for test every 2000 iterations. You can find the saved models and test results in the saving directory.
python train_audio.py --data_path datasets/celebaudio/BarackObama_01.wav --experiment_name VC_example_run --save_freq 2000 --test_path example/input_3_MartinLutherKing.wav --batch_size 8 --save_dir ./saved_models/
Check 'scripts/train_audiovisual.sh' for an example of training a Audiovisual-Synthesis model. We usually train an audiovisual model based on a pretrained audio model.
Generally, run:
python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --use_256 --load_model LOAD_MODEL_PATH
You can specify any audiovisual data as PATH_TO_TRAINING_DATA, and a small clip of audio as PATH_TO_TEST_AUDIO. The following script trains an audiovisual model based on a pre-trained Obama audio model, and use an input clip for test every 2000 iterations. You can find the saved models and test results in the saving directory.
python train_audiovisual.py --video_path datasets/video/obama.mp4 --experiment_name Audiovisual_example_run --save_freq 2000 --test_path example/input_3_MartinLutherKing.wav --batch_size 8 --save_dir ./saved_models/ --use_256 --load_model ./saved_models/VC_example_run/Epoch600_Iter00030000.pkl
If you want the video resolution to be 512 * 512, use the StackGAN-style 2-stage generation.
Generally, run:
python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --residual --load_model LOAD_MODEL_PATH
Check 'scripts/test_audio.sh' for an example of testing a Voice-Conversion model.
To convert a wavfile using a trained model, run:
python test_audio.py --model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT
You can specify any audio data as PATH_TO_INPUT. For example, the following script converts the input wavfile by use of a pre-trained audio model.
python test_audio.py --model ./saved_models/VC_example_run/Epoch600_Iter00030000.pkl --wav_path example/input_1_Trump.wav --output_file ./result.wav
Check 'scripts/test_audiovisual.sh' for an example of testing a Audiovisual-Synthesis model.
python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --use_256
You can specify any audio data as PATH_TO_INPUT. For example, the following script converts the input wavfile by use of a pre-trained audiovisual model.
python test_audiovisual.py --load_model ./saved_models/Audiovisual_example_run/Epoch600_Iter00030000.pkl --wav_path example/input_1_Trump.wav --output_file ./result.mp4 --use_256
python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --residual
This repo uses TensorboardX to visualize training loss. You can also check test audio results on tensorboard.
Start TensorBoard with tensorboard --logdir ./logs
.