What is the best model?

facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

MIT License

8.35k stars 1.06k forks source link

What is the best model? #360

Open MohammedMehdiTBER opened 2 years ago

MohammedMehdiTBER commented 2 years ago

❓ Questions

I want to know which one is the best: mdx_extra q or mdx extra? and what does q mean?

zjttfs commented 2 years ago

mdx: trained only on MusDB HQ, winning model on track A at the MDX challenge. mdx_extra: trained with extra training data (including MusDB test set), ranked 2nd on the track B of the MDX challenge. mdx_q, mdx_extra_q: quantized version of the previous models. Smaller download and storage but quality can be slightly worse. mdx_extra_q is the default model used. SIG: where SIG is a single model from the model zoo.

by https://github.com/facebookresearch/demucs/blob/main/README.md

Dyslexicon commented 1 year ago

Forgive me for asking in here, but I find nothing but dead-ends elsewhere: Is there any way to actually install and utilize the competing models from Band-split RNN, which also achieves a 9.0 overall SDR?

I just want to try it and compare to Demucs, as it appears to be the only worthy competitor. I wonder what would happen if the Spectral only training data-set used for Band-split RNN was added into the training set of HT Demucs??

Looking forward to any possible user-accessible implementation of the Sparse Hybrid Transformer model. Also keenly interested in any new and improved models resulting from the 2023 SDX Challenge!

ClaireCJS commented 1 year ago

I came here to prevent myself from asking a duplicate question, and yet...

...the answer here isn't clear.

I still don't know which model is best for me to use.

I can certainly understand that one of the models says it takes 4X longer with better results. But I still don't know if that's the one with the best results.

I want the best results because i'm generating karaoke files for songs i may listen to 1000s of times, and I don't want the vocals to be as clear as possible when separated so they can be properly transcribed.

In theory speed matters, but the job is already going to take over a month to run, so maybe speed doesn't matter that much after all if I'm willing to wait that long.

Does my use case influence which is the best model? Is there a best model at least **FOR ME***?

CarlGao4 commented 1 year ago

I'd recommend using htdemucs_ft with shifts no less than 4 if you have enough time. You can switch off shifts if you don't have enough time then.

ClaireCJS commented 1 year ago

what do you mean by "shifts"?

I ended up using mdx_extra and htdemucs and mdx_extra maybe sounded better, but it was hard to tell.

liuzhao1225 commented 12 months ago

I'd recommend using htdemucs_ft with shifts no less than 4 if you have enough time. You can switch off shifts if you don't have enough time then.

what is the different between 'htdemucs_ft' and 'htdemucs'?

CarlGao4 commented 12 months ago

htdemucs_ft is fine-tuned from htdemucs