FrenchKrab / IS2023-powerset-diarization

Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.
68 stars 4 forks source link

Get speaker posteriors from local EEND model #2

Closed xiangzai0115 closed 1 year ago

xiangzai0115 commented 1 year ago

Hi,

Thanks for this amazing work! Is there any way to get speaker posteriors from local EEND model?

Cheers, Xiang

FrenchKrab commented 1 year ago

Sure ! Take any example from the repository that uses the Inference class (eg [1], [2]), and add the option skip_conversion=True. For example:

from pyannote.audio import Inference, Model

model = Model.from_pretrained("powerset_pretrained.ckpt")
inf = Inference(model, step=2.5, skip_conversion=True)
result = inf(my_audio_file)
result

This would give you something like that as output image

Note that this is the output of the LogSoftmax layer. To obtain "probabilities":

import numpy as np
result.data = np.exp(result.data)
result

image

Finally, if you want the logits, you can remove the LogSoftmax with

model.activation = torch.nn.Identity()

and run the Inference as usual, with skip_conversion=True.

xiangzai0115 commented 1 year ago

Cool, that's wonderful!!

Thanks!

FrenchKrab commented 1 year ago

I'm closing the issue for now, but please do reopen it if any details are missing or if you encounter problems.