This is the official repository of A Benchmarking Initiative for Audio-domain Music Generation Using the FreeSound Loop Dataset co-authored with Paul Chen, Arthur Yeh and my supervisor Yi-Hsuan Yang. The paper has been accepted by International Society for Music Information Retrieval Conference 2021. [Demo Page], [arxiv].
We not only provided pretrained model to generate loops on your own but also provided scripts for you to evaluate the generated loops.
$ conda env create -f environment.yml
Generate loops from one-bar looperman pretrained model
$ gdown --id 1GQpzWz9ycIm5wzkxLsVr-zN17GWD3_6K -O looperman_one_bar_checkpoint.pt
$ bash scripts/generate_looperman_one_bar.sh
Generate loops from four-bar looperman pretrained model
$ gdown --id 19rk3vx7XM4dultTF1tN4srCpdya7uxBV -O looperman_four_bar_checkpoint.pt
$ bash scripts/generate_looperman_four_bar.sh
Generate loops from freesound pretrained model
$ gdown --id 197DMCOASEMFBVi8GMahHfRwgJ0bhcUND -O freesound_checkpoint.pt
$ bash scripts/generate_freesound.sh
$ gdown --id 1fQfSZgD9uWbCdID4SzVqNGhsYNXOAbK5
$ unzip freesound_mel_80_320.zip
$ CUDA_VISIBLE_DEVICES=2 python train_drum.py \
--size 64 --batch 8 --sample_dir freesound_sample_dir \
--checkpoint_dir freesound_checkpoint \
--iter 100000
mel_80_320
$ CUDA_VISIBLE_DEVICES=2 python generate_audio.py \
--ckpt freesound_checkpoint/100000.pt \
--pics 2000 --data_path "./data/freesound" \
--store_path "./generated_freesound_one_bar"
$ cd evaluation/NDB_JS
$ gdown --id 1aFGPYlkkAysVBWp9VacHVk2tf-b4rLIh
$ unzip looper_2000.zip # contain 2000 looperman mel-sepctrogram
$ rm looper_2000/.zip
$ bash compute_ndb_js.sh
$ cd evaluation/IS
$ bash compute compute_is_score.sh
FAD looperman ground truth link, follow the official doc to install required packages.
$ ls --color=never generated_freesound_one_bar/100000/*.wav > freesound.csv
$ python -m frechet_audio_distance.create_embeddings_main --input_files freesound.csv --stats freesound.stats
$ python -m frechet_audio_distance.compute_fad --background_stats ./evaluation/FAD/looperman_2000.stats --test_stats freesound.stats
In the preprocess directory and modify some settings (e.g. data path) in the codes and run them with the following orders
$ python trim_2_seconds.py # Cut loop into the single bar and stretch them to 2 second.
$ python extract_mel.py # Extract mel-spectrogram from 2-second audio.
$ python make_dataset.py
$ python compute_mean_std.py
CUDA_VISIBLE_DEVICES=2 python train_drum.py \
--size 64 --batch 8 --sample_dir [sample_dir] \
--checkpoint_dir [checkpoint_dir] \
[mel-spectrogram dataset from the proprocessing]
We use MelGAN as the vocoder. We trained the vocoder with looperman dataset and use the vocoder in generating freesound and looperman models. The trained vocoder is in melgan directory.
The code comes heavily from the code below
If you find this repo useful, please kindly cite with the following information.
@inproceedings{ allenloopgen,
title={A Benchmarking Initiative for Audio-domain Music Generation using the {FreeSound Loop Dataset}},
author={Tun-Min Hung and Bo-Yu Chen and Yen-Tung Yeh, and Yi-Hsuan Yang},
booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
year={2021},
}