Question about ailab dataset, pretrained model and MIDI files of ailabs dataset.

yen52205 commented 2 years ago

Hi, I has some questions about the pretrained dataset (ailabs) and the pretrained models.

On the main page of this repo, I found a link that provided pretrained models. There were three kind of models, "baseline, non-pretrained and pretrained.". Is the "pretrained checkpoint" mean that "this model pretrained by ailabs, and after that finetuned by EMOPIA"? Is there any checkpoint only trained by ailabs dataset?
I tried to train a model with the same hyperparameter with the one in paper, and used the first stage model only trained by ailabs to generate some songs, following was the script I used: python main_cp.py --mode inference --load_ckt ailabs--load_ckt_loss 20 --d_model 512 --n_head 8 --n_layer 12 --ffd_dim 2048 --num_song 50 --emo_tag 0 --task_type ignore --out_dir 'exp/ailabs/gen_midis/loss_20/test_1/' During generating the songs, I noticed that the model would generally generated over 200 bars in a song, and the model didn't know when to stop. Did you generate songs with the model only trained by ailabs? Did you have the same results with me?
I wanted to know what kind of music in ailabs dataset, was there MIDI files of ailabs dataset? When I checked the compound word transformer repo here, I found the link for ailabs dataset. But when I downloaded it, I only saw "midi_analyzed, midi_sychronized, midi_transcribed" that related to MIDI in the folder. Was files in "midi_analyzed" folder, equal to the song represented by MIDI format?

If anyone know the solution, could you probably share with me. thanks!!

annahung31 commented 2 years ago

Yes, your understanding is correct, and no, we don't release the checkpoint that only trained by ailabs dataset.
I did generate songs with the model only trained by ailabs, but I didn't face the problem of non-stopping. The generated results just have lower diversity in emotion.
Please check the README of the dataset. What kind of "MIDI" format do you want to find? After the MIDI files are transcribed from audio files, we preprocessed them to get a clear version. The preprocess contains beat mapping, quantization, and chord recognition. They are all MIDI files.

yen52205 commented 2 years ago

thanks for your answering. For point 2, could I have your training script and generated songs?

annahung31 / EMOPIA