ZFTurbo / Music-Source-Separation-Training

Repository for training models for music source separation.
364 stars 48 forks source link

Inquiry about Default Instrument Settings and Dual Loudness Augmentation in bs_roformer #39

Open EuiYeonKim opened 1 month ago

EuiYeonKim commented 1 month ago

Hello, I am currently analyzing your code in an attempt to reproduce the paper performance of bs_roformer. While examining the code, I have come across a few points that I am curious about and would like to inquire further.

Upon reviewing the settings related to bs_roformer, I noticed that the configuration predominantly uses only vocals and other in the instruments setup. Generally, MSS datasets like MUSDB18 are composed of a 4-stem setup: [vocals, drums, bass, other], and the mixture is a combination of these four stems. However, your default setting is [vocals, other], and using this setting results in a mixture composed only of vocals + other. I am curious whether this configuration is an error or if there was a specific task intended for this particular setup.

Additionally, while examining the code, I noticed that loudness augmentation is applied once during the load_random_mix function when performing mixup, and again in getitem. I would like to clarify whether applying loudness augmentation twice is by design, a misunderstanding on my part, or a coding error.

I would appreciate your response to these two questions.

Thank you.

EuiYeonKim commented 1 month ago

Additionally, I would like to reproduce the performance presented in the paper using only the MUSDB18-HQ dataset. Have you trained the model using only the MUSDB18-HQ dataset? Also, did the performance results align closely with those reported in the paper?

Thank you.

ZFTurbo commented 1 month ago

Hello. I didn't try to reproduce the paper results using only MUSDB18 dataset. I trained only vocals and bass models using big datasets. Also as I remember from paper authors trained independent model for each musdb stem.

You can use this config as starting point: https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/configs/config_musdb18_bs_roformer.yaml

Also note that probably authors has more efficient implementation of model because they used larger batch sizes on the same GPUs.