deezer / spleeter

Deezer source separation library including pretrained models.
https://research.deezer.com/projects/spleeter.html
MIT License
25.63k stars 2.81k forks source link

[Discussion] Training model using one huge WAV file not 150 #556

Open rikishi0071 opened 3 years ago

rikishi0071 commented 3 years ago

Hi! I'd like to train model in order to get better separation. I read that I need something like musdb for this but musdb syntaxis needs for me: 5 .wav files for each track (mixture, drums, basses, other, vocal), their .csv description (a table where n - number of rows describing each .wav location and duration in seconds) and .json metafile.

I couldn't find any API to generate my own track set and general .csv and .json files (not musdb 150 tracks) for training model.

I don't want to manually write .csv description for each track so I have an idea: what if I merge 1000-100000 .wav tracks into ONE huge .wav file and just use it for training model?

So, the question is: does 150x5 .wav files is the same as if I create only ONE set of 5 .wav files? Will ONE huge .wav file with duration for example 24 hours work as an input for training model?

MaxPayne86 commented 3 years ago

Hi! I'd like to train model in order to get better separation. I read that I need something like musdb for this but musdb syntaxis needs for me: 5 .wav files for each track (mixture, drums, basses, other, vocal), their .csv description (a table where n - number of rows describing each .wav location and duration in seconds) and .json metafile.

I couldn't find any API to generate my own track set and general .csv and .json files (not musdb 150 tracks) for training model.

I don't want to manually write .csv description for each track so I have an idea: what if I merge 1000-100000 .wav tracks into ONE huge .wav file and just use it for training model?

So, the question is: does 150x5 .wav files is the same as if I create only ONE set of 5 .wav files? Will ONE huge .wav file with duration for example 24 hours work as an input for training model?

What's the point in doing that? You then need to invent a way to identify in the single wav file vocals from arrangement. This means placing markers since you don't want to count them right? Otherwise if a single track has not the correct number of stems you fuck up the entire dataset...