Do I need to cut audio files into short samples in the dataset Musdb HQ before training?

ElizavetaSedova commented 2 years ago

Sorry for so many questions and thank you for answering them! In the training dataset, audio files are 3 minutes long. Do I need to cut them before training? I noticed in the configuration file segment=11. Could you explain how it works in training?

adefossez commented 2 years ago

no you do not need to do it, audio is loaded in chunks of 11 seconds automatically on the fly. The actual final length of the audio is actually not exactly 11 seconds because of various augmentation in particular tempo stretch.

ElizavetaSedova commented 2 years ago

Initially, I thought so. I want to find the reason why the model is not learning. If you look at the results in the log file, then Nsdr becomes negative and Loss does not decrease. I'm not sure if these are normal values. These are my valid summary logs. Unfortunately, I deleted my log file and cannot provide complete information. I trained the model up to epoch 42 and concluded that the model is not training. Can you guess what could be the reason?

{'Epoch': 1, 'Loss': 0.2191, 'Reco': 0.2191, 'Nsdr': 3.604, 'Best': 0.2191, 'Bname': 'ema_epoch_0', 'Penalty': 220.0989} {'Epoch': 2, 'Loss': 0.1992, 'Reco': 0.1992, 'Nsdr': 4.374, 'Best': 0.1992, 'Bname': 'ema_batch_0', 'Penalty': 230.2217} {'Epoch': 3, 'Loss': 0.1872, 'Reco': 0.1872, 'Nsdr': 4.861, 'Best': 0.1872, 'Bname': 'ema_batch_0', 'Penalty': 302.0689} {'Epoch': 4, 'Loss': 0.1808, 'Reco': 0.1808, 'Nsdr': 5.118, 'Best': 0.1808, 'Bname': 'ema_batch_0', 'Penalty': 268.4302} {'Epoch': 5, 'Loss': 0.1768, 'Reco': 0.1768, 'Nsdr': 5.317, 'Best': 0.1768, 'Bname': 'ema_batch_0', 'Penalty': 276.381} {'Epoch': 6, 'Loss': 0.275, 'Reco': 0.275, 'Nsdr': 1.988, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 15545.2617} {'Epoch': 7, 'Loss': 0.3821, 'Reco': 0.3821, 'Nsdr': -14.859, 'Best': 0.1768, 'Bname': 'ema_batch_0', 'Penalty': 12785.0723} {'Epoch': 8, 'Loss': 0.3497, 'Reco': 0.3497, 'Nsdr': -24.63, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 12727.165} {'Epoch': 9, 'Loss': 0.3139, 'Reco': 0.3139, 'Nsdr': -2.685, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 11761.2002} {'Epoch': 10, 'Loss': 0.3145, 'Reco': 0.3145, 'Nsdr': -6.65, 'Best': 0.1768, 'Bname': 'ema_batch_0', 'Penalty': 10920.6846} {'Epoch': 11, 'Loss': 0.3063, 'Reco': 0.3063, 'Nsdr': 0.616, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 10252.6143} {'Epoch': 12, 'Loss': 0.3049, 'Reco': 0.3049, 'Nsdr': -1.218, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 9658.7852} {'Epoch': 13, 'Loss': 0.3184, 'Reco': 0.3184, 'Nsdr': -14.567, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 9164.75} {'Epoch': 14, 'Loss': 0.306, 'Reco': 0.306, 'Nsdr': -4.058, 'Best': 0.1768, 'Bname': 'ema_batch_0', 'Penalty': 9022.5566} {'Epoch': 15, 'Loss': 0.3029, 'Reco': 0.3029, 'Nsdr': 0.87, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 8589.3672} {'Epoch': 16, 'Loss': 0.3015, 'Reco': 0.3015, 'Nsdr': 0.902, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 8193.9824} {'Epoch': 17, 'Loss': 0.3145, 'Reco': 0.3145, 'Nsdr': -7.49, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 7783.1387} {'Epoch': 18, 'Loss': 0.3028, 'Reco': 0.3028, 'Nsdr': -1.027, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 7364.397} {'Epoch': 19, 'Loss': 0.3233, 'Reco': 0.3233, 'Nsdr': -8.973, 'Best': 0.1768, 'Bname': 'ema_batch_0', 'Penalty': 7300.1909} {'Epoch': 20, 'Loss': 0.3113, 'Reco': 0.3113, 'Nsdr': -4.969, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 6780.5361} {'Epoch': 20, 'Loss': 0.3042, 'Reco': 0.3042, 'Nsdr': 0.872, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 6782.0547} {'Epoch': 20, 'Loss': 0.3008, 'Reco': 0.3008, 'Nsdr': -0.916, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 6781.1074} {'Epoch': 20, 'Loss': 0.3003, 'Reco': 0.3003, 'Nsdr': 0.934, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 6780.5361} {'Epoch': 20, 'Loss': 0.3003, 'Reco': 0.3003, 'Nsdr': 0.934, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 6781.4546} {'Epoch': 21, 'Loss': 0.3138, 'Reco': 0.3138, 'Nsdr': 0.568, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 10347.1885} {'Epoch': 22, 'Loss': 0.3078, 'Reco': 0.3078, 'Nsdr': 0.715, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 9350.3926} {'Epoch': 23, 'Loss': 0.3573, 'Reco': 0.3573, 'Nsdr': -16.674, 'Best': 0.1768, 'Bname': 'ema_epoch_1', 'Penalty': 8669.5605} {'Epoch': 24, 'Loss': 0.3531, 'Reco': 0.3531, 'Nsdr': -9.088, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 9079.1494} {'Epoch': 25, 'Loss': 0.3645, 'Reco': 0.3645, 'Nsdr': -13.863, 'Best': 0.1768, 'Bname': 'ema_epoch_1', 'Penalty': 8319.9951} {'Epoch': 26, 'Loss': 0.3408, 'Reco': 0.3408, 'Nsdr': -3.902, 'Best': 0.1768, 'Bname': 'ema_batch_0', 'Penalty': 7767.7632} {'Epoch': 27, 'Loss': 0.3037, 'Reco': 0.3037, 'Nsdr': 0.818, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 7290.394} {'Epoch': 28, 'Loss': 0.3635, 'Reco': 0.3635, 'Nsdr': -4.158, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 6897.0615} {'Epoch': 29, 'Loss': 0.6361, 'Reco': 0.6361, 'Nsdr': -5.01, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 6525.6772} {'Epoch': 30, 'Loss': 0.5323, 'Reco': 0.5323, 'Nsdr': -9.704, 'Best': 0.1768, 'Bname': 'ema_batch_1', 'Penalty': 6187.4297} {'Epoch': 31, 'Loss': 2.3778, 'Reco': 2.3778, 'Nsdr': -64.748, 'Best': 0.1768, 'Bname': 'ema_epoch_0', 'Penalty': 5859.7544} {'Epoch': 32, 'Loss': 2.2304, 'Reco': 2.2304, 'Nsdr': -65.091, 'Best': 0.1768, 'Bname': 'ema_epoch_0', 'Penalty': 5525.6274} {'Epoch': 33, 'Loss': 0.3067, 'Reco': 0.3067, 'Nsdr': 0.736, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 16593.0527} {'Epoch': 34, 'Loss': 0.303, 'Reco': 0.303, 'Nsdr': 0.857, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 14291.4668} {'Epoch': 35, 'Loss': 0.3016, 'Reco': 0.3016, 'Nsdr': 0.907, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 13010.209} {'Epoch': 36, 'Loss': 0.3013, 'Reco': 0.3013, 'Nsdr': 0.932, 'Best': 0.1768, 'Bname': 'main', 'Penalty': 12098.9902} {'Epoch': 37, 'Loss': 0.358, 'Reco': 0.358, 'Nsdr': -6.222, 'Best': 0.1768, 'Bname': 'ema_epoch_0', 'Penalty': 11642.7891} {'Epoch': 38, 'Loss': 0.4068, 'Reco': 0.4068, 'Nsdr': -4.791, 'Best': 0.1768, 'Bname': 'ema_epoch_0', 'Penalty': 11095.6387} {'Epoch': 39, 'Loss': 0.6287, 'Reco': 0.6287, 'Nsdr': -14.905, 'Best': 0.1768, 'Bname': 'ema_epoch_0', 'Penalty': 10675.8008} {'Epoch': 40, 'Loss': 0.338, 'Reco': 0.338, 'Nsdr': 0.135, 'Best': 0.1768, 'Bname': 'ema_epoch_0', 'Penalty': 10305.0537} {'Epoch': 41, 'Loss': 0.5431, 'Reco': 0.5431, 'Nsdr': -23.638, 'Best': 0.1768, 'Bname': 'ema_epoch_1', 'Penalty': 9963.1377} {'Epoch': 42, 'Loss': 0.3893, 'Reco': 0.3893, 'Nsdr': -7.182, 'Best': 0.1768, 'Bname': 'ema_epoch_1', 'Penalty': 9639.0332}

adefossez commented 2 years ago

You should post the actual training log, they should be in the XP folder where you found the metrics, as trainer.log I think. Those contains much more information. Maybe use a service like paste to upload.

ElizavetaSedova commented 2 years ago

It is very strange. I restarted the training and the values returned to normal, but at the 45th epoch, the losses again increased abnormally. Can I continue learning from a specific epoch number and not from the latest? What command can I do this? Unfortunately, I did not guess to save the checkpoints for each epoch.

adefossez commented 2 years ago

You cannot restart from an arbitrary checkpoint. Instabilities during training is a bit weird though. your training dataset might be too small. I see that the SVD penalty is taking gigantic values in your case which might be a sign of overfitting.

facebookresearch / demucs

Do I need to cut audio files into short samples in the dataset Musdb HQ before training? #354