Authors: Yu-Siang Huang, Yi-Hsuan Yang
Paper (arXiv) | Blog | Audio demo (Google Drive) | Online interactive demo
REMI, which stands for REvamped MIDI-derived events
, is a new event representation we propose for converting MIDI scores into text-like discrete tokens. Compared to the MIDI-like event representation adopted in exising Transformer-based music composition models, REMI provides sequence models a metrical context for modeling the rhythmic patterns of music. Using REMI as the event representation, we train a Transformer-XL model to generate minute-long Pop piano music with expressive, coherent and clear structure of rhythm and harmony, without needing any post-processing to refine the result. The model also provides controllability of local tempo changes and chord progression.
@inproceedings{10.1145/3394171.3413671,
author = {Huang, Yu-Siang and Yang, Yi-Hsuan},
title = {Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions},
year = {2020},
isbn = {9781450379885},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3394171.3413671},
doi = {10.1145/3394171.3413671},
pages = {1180–1188},
numpages = {9},
location = {Seattle, WA, USA},
series = {MM '20}
}
pip install tensorflow-gpu==1.14.0
)pip install miditoolkit
)We provide two pre-trained checkpoints for generating samples.
We provide the MIDI files including local tempo changes and estimated chord. (5 MB)
data/train
: 775 files used for training modelsdata/evaluation
: 100 files (prompts) used for the continuation experimentsSee main.py
as an example:
from model import PopMusicTransformer
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
def main():
# declare model
model = PopMusicTransformer(
checkpoint='REMI-tempo-checkpoint',
is_training=False)
# generate from scratch
model.generate(
n_target_bar=16,
temperature=1.2,
topk=5,
output_path='./result/from_scratch.midi',
prompt=None)
# generate continuation
model.generate(
n_target_bar=16,
temperature=1.2,
topk=5,
output_path='./result/continuation.midi',
prompt='./data/evaluation/000.midi')
# close model
model.close()
if __name__ == '__main__':
main()
You can find out how to convert the MIDI messages into REMI events in the midi2remi.ipynb
.
We strongly recommend using DAW (e.g., Logic Pro) to open/play the generated MIDI files. Or, you can use FluidSynth with a SoundFont. However, it may not be able to correctly handle the tempo changes (see fluidsynth/issues/141).
It is the temperature-controlled stochastic sampling methods are used for generating text from a trained language model. You can find out more details in the reference paper CTRL: 4.1 Sampling.
It is worth noting that the sampling method used for generation is very critical to the quality of the output, which is a research topic worthy of further exploration.
Please see issue/Training on custom MIDI corpus
modules.py
comes from the kimiyoung/transformer-xl repository.