Yujia-Yan / Skipping-The-Frame-Level

A simple yet effective Audio-to-Midi Automatic Piano Transcription system
MIT License
75 stars 8 forks source link

The default model parameter for training is different from the pretrained checkpoint #11

Open seyong92 opened 1 year ago

seyong92 commented 1 year ago

Hello, thank you for the valuable code sharing!

I have several questions about the code.

  1. The default parameter for training is different from the pre-trained model in the repo. For the default setting, it has 229 mel bins (as same as the paper), but the pre-trained model has 300 mel bins. Also, f_min and f_max value are different. Also I found that the pre-trained model has one more conv layer in the PreConvSpec. Does this change have a meaningful change on the performance?

  2. Also, when I tried the training (once with the default parameter, and the other with the pre-trained model parameter), both cases shows much lower performance than the pre-trained model (0.7403 for valid F1) and the score reported in the paper. I think the only difference is the batch size, which is 12 in the paper and 2 in the default parameter. Have you ever trained the model with batch size 2 or trained the model with the default parameter in this repo?

image

Again, thank you very much for sharing your code! 😁

Yujia-Yan commented 1 year ago

Hi, Thanks for your interest.

  1. I do not see those parameter matters, as far as I know. But this view may be biased as a larger range may accommodate more harmonics for seldomly played high notes.
  2. a. The curve looks under-fitting. The number of training steps should be increased accordingly. The code is using an aggressive learning rate scheduler (OneCircle) which requires the total number of iterations known beforehand. It's probably better to switch to a constant learning rate scheduler to train longer (e.g. the famous default 1e-4 learning rate) in practice if we don't know how much steps are enough. b. The architecture uses batch norm, which is also notorious for small batch training. That's why I have not even tried small batch like that. BTW, the spikes in train_f1 and others suggest you may have some issue in dataset. Have you checked the sampling rate of the data?
seyong92 commented 1 year ago

Thank you for the fast comment! I will change the learning rate and share the results after training.

Also, I found that some of the files are not correctly resampled to 44100, as you said. Thanks!