facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
8.14k stars 1.02k forks source link

Timbre based separation #439

Open LmYjQ opened 1 year ago

LmYjQ commented 1 year ago

❓ Questions

Hi. I have tried to separation a violin+piano duet piece by demucs, but only the other.wav file is not empty, and both violin and piano are in this file. Can I separate violin and piano by some configuration?

image
CarlGao4 commented 1 year ago

What can be separated is defined by model. You can use htdemucs_6s to separate piano, though the effect is very bad. Add -n htdemucs_6s to your command.

npbool commented 11 months ago

htdemucs_6s has piano extractor but the result is not good. I collected some piano and violin solo music and synthesize training data and finetune htdemucs_6s on it with l2 loss (much better than l1). The fine tuned model produces quite acceptable results.

CarlGao4 commented 11 months ago

@npbool is it possible to share your model?

npbool commented 11 months ago

@CarlGao4 trained several hours with T4 GPU with a small subset of my music collection. But the result is ok for my personal use. Maybe I'll train with full dataset later. https://drive.google.com/file/d/1-AgDR0wtdu-gP9CQsf7LdMi11I9WaX3B/view?usp=sharing

CarlGao4 commented 11 months ago

Thanks!!! I'll try it out myself without posting it to other websites

npbool commented 11 months ago

a quick example on which my model improves most in my music collection

input:

https://github.com/facebookresearch/demucs/assets/1175364/e0c3777e-7834-4891-a504-2c2bd7bd1bd1

result from htdemucs_6s

https://github.com/facebookresearch/demucs/assets/1175364/486f043f-ae62-4f00-82b9-9714baf39074

result from finetuned model

https://github.com/facebookresearch/demucs/assets/1175364/6447074e-5d22-4429-8fb1-ef36b0bbc12f

CarlGao4 commented 11 months ago

I feel that maybe I should only pass the other stem of htdemucs_ft to this model to get better results?

npbool commented 11 months ago

I feel that maybe I should only pass the other stem of htdemucs_ft to this model to get better results?

sorry I'm not familiar with the code and didn't understand what do you mean by "pass the other stem of htdemucs_ft to this model".

the model is trained to extract piano accompaniment from a violin/piano duet (as the issue is concerned about). so only piano output is needed (--two-stems piano) and I didn't evaluate the model on other types of music or the quality of drum/vocal/etc sources.

npbool commented 11 months ago

did you mean the pre-trained htdemucs_ft model? I didn't use it. I start from htdemucs_6s and only train the piano stem on my custom dataset

CarlGao4 commented 11 months ago

the model is trained to extract piano accompaniment from a violin/piano duet

I see. What I did is that I separated an audio using htdemucs_ft model and separate its output other using your model to get better results.

bitsrfr commented 10 months ago

@npbool or others could you point me down the right path of how to use the fine tuned model above with Demucs?

npbool commented 7 months ago

@bitsrfr I modify get_model() code in demucs/pretrained.py to load local checkpoint.