PIT Loss for multichannel audio for speech separation

asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers

https://asteroid-team.github.io/

MIT License

2.21k stars 419 forks source link

PIT Loss for multichannel audio for speech separation #691

Open SutirthaChakraborty opened 6 months ago

SutirthaChakraborty commented 6 months ago

I have a 4 channel audio generated by my model (left,right,side,mid). I can I apply PIT loss into it The shape of the tensors are Speaker one : [batch,channel,time] Speaker two: [batch,channel,time]

If I need to apply PIT, how should I apply : [batch,channel,speaker,time] ?

if I convert it to mono, or take the mean, the model is unable to learn 4 channels properly.

mpariente commented 6 months ago

I think the channel should be first, in order to build the permutation matrix of dimension (batch, speaker, speaker) with broadcasting.