facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
8.33k stars 1.06k forks source link

Training the Model #88

Open cesardelpiano opened 4 years ago

cesardelpiano commented 4 years ago

Hello everyone, I am not an expert programmer, but I have this program running excellent, I would like to train this model to improve the result every time, someone can give me an idea how to do it. Thank you

adefossez commented 4 years ago

Hi @cesardelpiano , AI models don't learn completely like humans. While humans do need a teacher to start on a new skill, they are usually able to improve on their own after having reached a certain level, for instance when learning to play a music instrument. Machines however cannot learn on their own, they need strong teaching. While there is some work to try to make them learn on their own (called self-supervised or semi-supervised learning), this is still quite prospective and there is currently no way to make the model better by just using it. What you would need is a large catalogue of songs for which you have the exact ground truth, i.e. individual recordings for the bass, drums, vocals and others.

0xBEEEF commented 4 years ago

The idea is interesting. Could the quality of the data be further improved with a large amount of data, or does the existing model already have a certain saturation where the learning success is very low?

What I noticed recently, for example, was that saxophones or wind instruments are generally interpreted by the model (no matter which one) mostly as voice. Well the lower part of the voice actually has some similarities on an analyzer. But for this I made another discovery. If you have a lot of brass instruments in one piece and very distinct high frequencies from the drums, the model thinks that this belongs to the drums. So the separation doesn't work so perfectly with the trebles either. But it seems to be a general problem, as I learned from other studies on this topic. The more narrow-banded a signal is, the better it is for a model to learn.

Another thing that has dropped in weight are modern EDM tracks. There is only limited stereo information available. So a lot of the drums or drum computer samples are interpreted as speech which leads to bizarre results. For example, the Tasnet model is clearly superior and the results are much clearer. Seems to be the case with pure mono signals as well.

MaxiReggi commented 3 years ago

@adefossez Hi! I red in another issue thread that you said that is not possible to add custom data (other stems and mixture) to train the model, so I'm confused because here and in another thread you say that training with extra data would improve model's quality. My question is: is it possible or not? If someone else knows about this topic can answer me? Thank you very much for this incredible job!! Cheers. Maxi

adefossez commented 3 years ago

see my reply on the other issue: https://github.com/facebookresearch/demucs/issues/106