Closed jeromew closed 4 months ago
Hello,
Thank you for the issue. Do you want to contribute these models ? We'll welcome them for sure !
Hello, thanks for your response.
I am afraid I am too far from this field at the moment to be able to contribute models. I was just playing around with source separation models to try and solve a CTF puzzle involving a difficult to parse audio mix. I will join the slack channel if things change.
I am closing this issue as I am sure you are not missing models to integrate into asteroid and that those 2 will re-appear if they are key to the field. In the meantime you will have one less issue in github !
🚀 Feature
I suggest the addition of the mossFormer2 and sepTDA models
Motivation
The 2 models seem to be improving the SOTA on the speaker separation task. cf https://paperswithcode.com/sota/speech-separation-on-wsj0-2mix
sepTDA :
mossformer2:
What you'd like
A implementation of the models in asteroid with a running pretrained model for inference
Alternatives
I managed to have mossformer2 inference work via https://modelscope.cn/models/iic/speech_mossformer2_separation_temporal_8k/summary
Additional context
I try to separate sources with an unknown number of speakers on a difficult audio track (opera music + many speakers with a lot of overlapping)