mossFormer2 & sepTDA models

jeromew commented 5 months ago

🚀 Feature

I suggest the addition of the mossFormer2 and sepTDA models

Motivation

The 2 models seem to be improving the SOTA on the speaker separation task. cf https://paperswithcode.com/sota/speech-separation-on-wsj0-2mix

sepTDA :

mossformer2:

What you'd like

A implementation of the models in asteroid with a running pretrained model for inference

Alternatives

I managed to have mossformer2 inference work via https://modelscope.cn/models/iic/speech_mossformer2_separation_temporal_8k/summary

Additional context

I try to separate sources with an unknown number of speakers on a difficult audio track (opera music + many speakers with a lot of overlapping)

mpariente commented 5 months ago

Hello,

Thank you for the issue. Do you want to contribute these models ? We'll welcome them for sure !

jeromew commented 4 months ago

Hello, thanks for your response.

I am afraid I am too far from this field at the moment to be able to contribute models. I was just playing around with source separation models to try and solve a CTF puzzle involving a difficult to parse audio mix. I will join the slack channel if things change.

I am closing this issue as I am sure you are not missing models to integrate into asteroid and that those 2 will re-appear if they are key to the field. In the meantime you will have one less issue in github !

asteroid-team / asteroid