axeldelafosse / stemgen

🎛 Stemgen is a Stem file generator. Convert any track into a Stem and have fun with Traktor.
https://stemgen.dev
MIT License
212 stars 39 forks source link

upgrade demucs version to v4 and htdemucs/htdemucs_ft model #7

Closed awesomer closed 1 year ago

awesomer commented 1 year ago

The v4 version features Hybrid Transformer Demucs, a hybrid spectrogram/waveform separation model using Transformers. It is based on Hybrid Demucs (also provided in this repo) with the innermost layers are replaced by a cross-domain Transformer Encoder. This Transformer uses self-attention within each domain, and cross-attention across domains. The model achieves a SDR of 9.00 dB on the MUSDB HQ test set. Moreover, when using sparse attention kernels to extend its receptive field and per source fine-tuning, we achieve state-of-the-art 9.20 dB of SDR.

Since v3 demucs was 7.7dB of SDR, this upgrade seems as significant an improvement as the one from Spleeter (5.9dB) to v3 demucs (7.7dB). Anecdotally, it sounds awesome. Patch should be very easy, but I'm not sure how to enforce v4 of the demucs library, or whether you prefer htdemucs or the much-slower-but-better htdemucs_ft.

axeldelafosse commented 1 year ago

Hey @awesomer! Yeah, I upgraded last week in https://github.com/axeldelafosse/stemgen/commit/029376ee6300d8b2073b54d2e9650b7e1ba87b71 :) I chose htdemucs_ft because I prefer to default to the best model possible, even if it's slower. The goal of this script is to output the highest quality stems. mdx_extra was already super slow anyway, the impact should be small.

Let me know if it's working for you!