Audio-WestlakeU / FS-EEND

The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024]
MIT License
75 stars 4 forks source link

idea #10

Open DAYTOY1112 opened 8 months ago

DAYTOY1112 commented 8 months ago

老师您好,请问你们有尝试过在帧级speaker embedding上面拼上使用预训练的说话人认证模型提取出的speaker embedding的相关实验吗?我这边在尝试这种做法,但实验效果一直没有达到预期

DiLiangWU commented 8 months ago

您好,我目前还没有做过拼接target speaker embedding的实验。不过您可以参考下Duke Kunshan University的Ming Li老师组关于target-speaker voice activity detection的研究,他们是沿着这个技术路线做的。 (https://scholars.duke.edu/person/MingLi/publications)