There's a lot of similarity between enhancement and deverberation models with source separation. They can use the same backbones, etc.
Really the only difference is that you are outputting only a single channel. The noise or reverberation you remove isn't good to minimize loss over, only the clean signal.
It would be great if this repo supported single channel output datasets!
(And, you could consider adding speech enhancement backbones like sepformer from speechbrain. This new diffusion based model, SGMSE has good pretrained checkpoints that would probably be good after finetuning for separating vocals from non vocals, assuming the new finetuning included reverberated vocals.)
There's a lot of similarity between enhancement and deverberation models with source separation. They can use the same backbones, etc.
Really the only difference is that you are outputting only a single channel. The noise or reverberation you remove isn't good to minimize loss over, only the clean signal.
It would be great if this repo supported single channel output datasets!
(And, you could consider adding speech enhancement backbones like sepformer from speechbrain. This new diffusion based model, SGMSE has good pretrained checkpoints that would probably be good after finetuning for separating vocals from non vocals, assuming the new finetuning included reverberated vocals.)