facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
8.15k stars 1.02k forks source link

Suggestion for a future Demucs release: Genre-Specific Models #382

Open cdm7825412 opened 1 year ago

cdm7825412 commented 1 year ago

Hi, thanks for creating and releasing this wonderful tool. It is giving me a whole new way to customize and enjoy my favorite music.

My request/suggestion is to train Demucs using Genre-Specific training sets.

While I understand the motivation to make it a general tool to perform well with any type of musical sources, Demucs may be able to do an even better job if trained with separated homogeneous training sets. I would love to do this myself but I do not have the resources and the required of training data available, that I believe your team do.

For example, the results that I obtain processing Pop/Rock genres are much better than other genres like Jazz or Latin. Songs with multiple wind instruments and/or multiple percussion instruments do not produce results as good as songs in which most of the instruments are strings.

This also applies to the tempo (bps) of the input files. The slower the tempo, the better performance.

adefossez commented 1 year ago

The main issue is that the dataset is biased towards pop/rock, and we do not have that much training data. Splitting the dataset per genre would just lead to overfitting and worse models. Keep in mind the basic training set is only 87 songs !! this is not really enough to cover all the diversity.

cdm7825412 commented 1 year ago

What?! Only 87 songs !! It is impressive the results that Demucs produce with such a small training set. For my image processing applications I use training sets of thousands of images.

adefossez commented 1 year ago

One song is definitely more complex than a single image. The extra model are trained with an extra 150 non public songs which is a bit better but nowhere near enough to cover all instruments and styles. But even the basic model works quite well given the little training data.