deezer / spleeter

Deezer source separation library including pretrained models.
https://research.deezer.com/projects/spleeter.html
MIT License
25.64k stars 2.81k forks source link

[Discussion] training on 500GB anything I should be aware of? #712

Closed dustyny closed 2 years ago

dustyny commented 2 years ago

I've generated a data set of about 85,000 examples, each WAV file is 13 seconds long, stereo, 16 bit, 44.1khz; totalling close to 500GB of uncompressed audio.

I'm planning on training a 5 stem model, focused on splitting drums. I have the mixed drum loops and then 5 stems for each (kick, snare, hats, crash, Tom's).

I'll be running training using a Nvidia A100 GPU on Google Cloud.

Is there anything that I should take into consideration before starting training? Any idea on how long I should plan for this to run? Should I be concerned about GPU memory?

It's taken me a while to get this dataset built, I'm excited to see how this test works out. 🤓

Thanks!

TonyVie commented 2 years ago

Hi dustyny ,

Do you have any result on this training ? Is Spleeter performing well on splitting drums ?

dustyny commented 2 years ago

Not yet.. I was hoping someone could give me an idea of how long training should take. I also ran into an issue with Torch & Cuda on an A100 that I'm trying to workout.

cewatkins commented 2 years ago

uh, i dunno i was looking at 256gb training attempt, but, pulling that dow, can efect time, evan from 1 cloud 2 a nother.

dustyny commented 2 years ago

Unfortunately Spleeter has some scaling issues. The preprocessing stage doesn't use the GPU at all but it does use all CPUs but unfortunately CPU utilization is very low 15-30% per core. As far as I can tell you can't split the preprocessing from the training as it is now, so if you have a large data set you're going to waste a ton of time with your GPU idle and your CPUs will be underutilized, which is very expensive thing to do when you're paying for per hour.

Some other things to be aware of:

Plan to have at least 2x the size of your data set for training and validation caching data. I think the real number is closer to 1.5x but I'd add a buffer just to be sure.. Running out of disk space 3 days into preprocessing and having to restart was not fun..

RAM utilization is much higher than I expected, I only had 12 GB available and I ran out about 4 days into preprpcessing.. Normally I would have used 64 or 128GB to be safe but I can't use that much with a GCP VM that has a GPU. Since I don't have a good way around this limit, this is pretty much a brickwall for me..

Yes I can reduce my training set to 1/3 but that defeats the purpose of what I set out to do. I'm not experienced enough to improve the performance of the model but I can produce a better training and validation set.

tombohub commented 2 years ago

@dustyny how much money did you spend for one training?

dustyny commented 2 years ago

I didn't get a training finished yet. I ran out of ram.. I'm going to give it another try sometime this week.

DaveyUS commented 1 year ago

@dustyny This was a while ago, but do you have any update on whether you found success splitting drums?

dustyny commented 1 year ago

No I tried a few different alternatives to Spleeter... all had the same problem.. I couldn't get them to train on anything other than the testing data.. None of the authors were to interested in supporting their code.. I'm waiting for this year's competition to finish and see if I can get the authors to properly document how to train.. Hopefully we'll see some diffusion based models, they are getting amazing results with AI art and sound.

DaveyUS commented 1 year ago

Ah that's too bad. I'm guessing you've heard of it but I found demucs to be a better alternative to spleeter.. It seems to be much more heavily supported since it's Meta behind the wheel.

Also I found this other project which already accomplishes what you are trying to train for- https://www.audiolabs-erlangen.de/content/resources/MIR/NMFtoolbox/demoDrumSoundSeparationNMF.html