Anjok07 / ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.
MIT License
17.26k stars 1.29k forks source link

Are there any models that seperate overlapping voices? #451

Open morgancod opened 1 year ago

morgancod commented 1 year ago

Are there any models that seperate overlapping voices? It means seperating different voices in a single audio file.

Derpiesaurus commented 1 year ago

Not yet AFAIK. I believe that would be pretty hard as well. You'd either need an insane amount of training data for each possible type of voice, or you'd have to train the model on specific voices. There could be other ways of doing it, but it's challenging either way.

(I don't train models though, so I might be wrong, but I think I know enough about it to confidently give you that answer)

jarredou commented 1 year ago

There are some algos for multiple speakers separation, but they are made for speech and trained with speech, not singing. Maybe trained "multiple singers", these algos can do the trick.

dts350z commented 1 year ago

You can do some separation based on panning, assuming the backing/overlapping vocals are panned differently from the lead vocals in the stereo mix. I have a free tool for that here: http://www.surroundbyus.com/sbu/viewtopic.php?f=8&t=994 who's main purpose is upmixing stereo to 5.1, but if you drop a vocal stem in there it will separate "centered" vocals from panned left and right vocals, putting them in different channels of the 5.1 output.

Having said that, Demix Pro seems to be able to separate overlapping vocals, even from mono sources, so is not based (just) on panning. Demix Pro seems to have Demucs under the hood, so yeah, I'm wondering if the backing vocal separation is a demucs model and if so, where it came from.