Is there a good Multi Speaker Separation model ?

Anjok07 / ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

MIT License

17.73k stars 1.32k forks source link

Is there a good Multi Speaker Separation model ? #636

Open AlonDan opened 1 year ago

AlonDan commented 1 year ago

We already have great Separation models: Audio and Music / De-Noise / De-Echo / De-Reverb etc..

Is there a Multi Speaker Separation Model, when 2 (or more) people talk at the same time. For example This one from Meta (facebook)

I'm wondering if someone already managed to train such model for UVR and would like to share with the community.

Thanks ahead for sharing where to download such models 🙏

VamuveTV commented 1 year ago

Great question. Also, googlle seems to do something like that sometime ago. It would be excellent to implement it in UVR, so we could separate different voices on the same audio file. Btw, it could be good to implement it independet of the voice language, i mena, it could bve used to separate a audio in portuguese voices, english, italian and so on,

VamuveTV commented 1 year ago

Btw, it do have a implementation in github here: https://github.com/facebookresearch/svoice So, it should be possible to use it in UVR as well ?

AlonDan commented 1 year ago

I saw svoice and other related projects that does the same thing very nicely. I don't know if it's possible to train such model to work within UVR since I'm not a programmer but when I once mentioned the De-Echo / De-Reverb people didn't realize it's possible as well... so maybe?