A few issues about training with the GPT-2 Jupyter notebook

ufsteven commented 1 year ago

Hi there!

I'm trying to train a GPT-2 model on the EMOPIA dataset using the GPT-2 jupyter notebook in this repo, but I'm experiencing severe overfitting during the training process. Here's a table summarizing the training loss, validation loss, and accuracy at different steps:

Step | Training Loss | Validation Loss | Accuracy -- | -- | -- | -- 1000 | 3.053000 | 3.006334 | 0.000433 2000 | 2.771500 | 2.757644 | 0.000003 3000 | 2.399400 | 2.435765 | 0.000090 4000 | 2.181200 | 2.290239 | 0.000302 5000 | 2.003100 | 2.267199 | 0.000189 6000 | 1.774400 | 2.339681 | 0.000270 7000 | 1.578500 | 2.509524 | 0.000186 8000 | 1.308200 | 2.744518 | 0.000218 9000 | 1.038400 | 3.058719 | 0.000077 10000 | 0.796100 | 3.431878 | 0.000051 11000 | 0.582900 | 3.806573 | 0.000067 ... | | |

I noticed that the validation loss starts to increase after 5000 steps and the accuracy stays quite low. Looking at the generated midi, the model can only generate some intermittent segments with long rests. As a newcomer to the HF transformers library and GPT-2 model, I'm not sure whether this issue is caused by an insufficient dataset size or incorrect hyperparameter configuration. Could you please provide some guidance or resources for troubleshooting this problem? Thank you!

Steven

Natooz commented 1 year ago

Hi Steven, 👋

Indeed this looks like overfitting. EMOPIA is a quite small dataset, and each MIDI file is quite short (i.e. less tokens) if I remember well. Maybe you should try with smaller input training sequence lengths (when creating dataset change the min_seq_len and max_seq_len values, e.g. 50 and 250), smaller model size, smaller learning rate, or use another dataset. Maestro and GiantMIDI are bigger and should fit better.

Nathan

ufsteven commented 1 year ago

Thank you for your prompt response! I followed your advice to train on a larger dataset and adjusted the sequence length and model size, and there was indeed some improvement.

Best wishes! Steven

ufsteven commented 1 year ago

Hi, I am reopening this issue for another problem.

While training on the Maestro dataset, I noticed that although overfitting didn't occur early and the loss kept decreasing, the Accuracy remained consistently low (~ 2e-4 or even lower). The quality of the generated MIDI is terribly poor. 😢

I wonder if there is anything else that can be improved upon? Maybe the structure of GPT-2 is not very suitable for music generation?

Natooz commented 1 year ago

Hey 👋

About accuracy, the metric from hugging face actually corresponds to F1, so it is not surprising to have very low values as it is not intended to measure generative performances, and especially in music where at each decoding step there isn't one expected results. We might just replace it with a "argmax" accuracy that would be more relevant. Concerning performances, increasing the model size would help. BPE also helps very much, you might use it to build a vocabulary of 2k ~ 10k tokens (to test and chose depending on model / dataset size). And finally, maestro can be consider as challenging data, as it is mostly composed of music with complex melodies and arrangements. Training a model with it it actually not an easy task (ie it is challenging to get good results).

ufsteven commented 1 year ago

Thanks for your suggestions! Very helpful and cleared up my confusion ☺

Natooz / MidiTok

A few issues about training with the GPT-2 Jupyter notebook #29