facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
8.16k stars 1.03k forks source link

Correct using of htdemucs #414

Open ProgressiveProgression opened 1 year ago

ProgressiveProgression commented 1 year ago

❓ Questions

Hi. When I use

demucs -n htdemucs "Path" commmand I get _"FATAL: htdemucsft is neither a single pre-trained model or a bag of models." message. What am I doing wrong? Thank you in advance.

CarlGao4 commented 1 year ago

Please run python -c "import demucs;print(demucs.__version__)". htdemucs is only available on 4.0.0 now.

ProgressiveProgression commented 1 year ago

Oh, that was my fault. I did not upgrade the demucs correctly and even did not notice it. Thank you. The ticket can be closed.

dtramm1 commented 1 year ago

I've installed demucs quick and easy via pip, but 3.0.6 is the highest version available. Can someone advise on how to update to 4.0.0 so I can try the ht models?

Also, my GPU only has 6GiB memory in case that's not enough to bother with the ht models.

usdivad commented 1 year ago

From the README:

For bleeding edge versions, you can install directly from this repo using

python3 -m pip install -U git+https://github.com/facebookresearch/demucs#egg=demucs

For Hybrid Transformer Demucs, you must install the bleeding edge version and use either -n htdemucs or -n htdemucs_ft.

Hope that helps!

dtramm1 commented 1 year ago

Thank you, I got it working. It actually performed worse at the specific task i was testing! I was testing to see which models did the best at separating the intro bass line of Guns N Roses "Sweet Child O Mine". The bass line is buried within Slash's lead guitar part (measures 9 to 16). The MDX and MDX_extra models both pull it out reasonably well (slightly differently), while the bleeding edge models fail to separate it. Spleeter fails at this as well.

adefossez commented 1 year ago

Interesting, I had some feedback from @CarlGao4 stating that for some tracks it was performing worse than the existing model. We did use a number of extra tracks to train the new models, and we couldn't check for all of them that there wasn't some mislabeled stems.

If you have time, might you check with -n hdemucs_mmi and see how it compare ? It uses the same architecture as mdx_extra, but with the training data of htdemucs.

dtramm1 commented 1 year ago

For this track (Sweet Child O Mine), the mmi works a little better than the other two to pick up the intro bass, but not by much. It picks up the faint ghost of a bass note at the start of measure 9, then no more notes until measure 11 (whereas the bass is playing steadily from measure 9). It does pick up notes from measure 11 onward, but they are not clear until measure 17 when the bass starts playing lower notes more typical of a bass part. The htdemucs and ft model don't pick up any notes until measure 13.

The original mdx model works the best and substantially better than mdx_extra upon closer analysis picking up every note clearly from the start of measure 9. The mdx_extra model starts picking up notes at measure 9, but it randomnly drops to silence for a moment or two. There's actually one note at the end of measure 8 which leads into the part that all of the models miss.

By the way, I tried lalal.ai as well, and it's performance is between the h models and mdx_extra. It does pick up most notes from measure 9 onward but drops more notes than the mdx_extra model does).

I also tried all 5 models on a very different track: "Isolation" by Alter Bridge. This is a very heavy song with a low partly distorted bass line that is often hard to hear (yet relatively clear for the genre). I'm not sure what the original isolated bass should sound like. Every model seems to pick up all notes rather clearly, but the mdx model picks up more distortion and noise which is probably an accurate part of the bass playing. It definitely leaves the guitar part in the "other" channel sounding a bit clearer.

Dyslexicon commented 1 year ago

Ive found LALAL.ai to be about half as good as Demucs. Commercial programs like Izotope and Acoustica are roughtly half as good as LALAL. Spleeter and its clones... yikes, the bottom of the barrel.

I too noticed issues with htdemucs and htdemucs_mmi not being as clean as the mdx_extra model in some cases. The Vocal stem is quite an improvement over mdx_extra, but the drum/bass/other stems on the new models tend to produce some mis-assigned frequencies, and truncated regions in the spectrum, which can be seen in the spectral view using a utility such as Spek http://spek.cc

Thank you to the Demucs team for the ongoing research and development.

CarlGao4 commented 1 year ago

Ive found LALAL.ai to be about half as good as Demucs. Commercial programs like Izotope and Acoustica are roughtly half as good as LALAL. Spleeter and its clones... yikes, the bottom of the barrel.

First of all, the model is called hdemucs_mmi or htdemucs_ft. Different to what you have found out, I think htdemucs_ft has worse results on vocals when separating chorus. I can still hear the ghosts of chorus in other stem using all of the models, but the "ghost" in the results of htdemucs_ft (I didn't try htdemucs) is louder.

Dyslexicon commented 1 year ago

Worth noting I am judging not by perception alone but spectral view of the resultant stems. I heard some bleed artifacts in the Drum stems on the new models that did not happen with the same input file using mdx_extra. Also it appears that the new models are prone to creating horizontal banding of mis-appropriated frequencies, in guitar and bass stems. Drum stems also tend to exhibit an upper-register shelfing drop-off in response at uppermost frequencies that looks similar to what happens with lossy codecs. So for now I will be sticking with mdx_extra.

Overall the new models seem to be trained better but have some issues with bleed and mis-appropriation.

dtramm1 commented 1 year ago

I'm wondering to what extent processing an MP3 vs. a lossless file has an affect on the output. I could do some testing, but I'm curious if anyone has already done this. Does separating an MP3 result in obvious artifacts or just a loss of quality similar to that already exhibited in the MP3?

Dyslexicon commented 1 year ago

I have re-evaluated my stance due to listening to more stem examples and closer examination of spectral views of the output files in Spek, and it's now clear that htdemucs_ft is significantly better than mdx_extra. So I rescind and retract my previous initial statements about the new flagship model.