TomJwYu / WenetSpeechSpeakerCluster

55 stars 2 forks source link

speaker cluster method #4

Open Liujingxiu23 opened 10 months ago

Liujingxiu23 commented 10 months ago

"spectral clustering to the speech segments" is there any github code, link or paper?

Liujingxiu23 commented 10 months ago

1.Did you use https://github.com/wq2012/SpectralCluster? In the README, it claims several methods, like base, auto-tune, which detail method did you use? 2.Could I use the same method to process gigaspeech? Do you have any exprience to process this dataset?

tomasJwYU commented 10 months ago

Hi, We use https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse/v1 for spectral clustering. More technical details can be found at https://arxiv.org/pdf/2309.13905.pdf. I will provide our speaker cluster results for gigaspeech in two days. (* ̄︶ ̄)

Liujingxiu23 commented 10 months ago

Hi, We use https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse/v1 for spectral clustering. More technical details can be found at https://arxiv.org/pdf/2309.13905.pdf. I will provide our speaker cluster results for gigaspeech in two days. (* ̄︶ ̄)

Thank you for your reply! It's great to share gigaspeech result! I find your paper, "AUTOPREP", it's a great work!

For noise reduction, there is no office public tools for “Band-SplitRNN”? Can you recommend any other public tools? Is https://github.com/facebookresearch/denoiser a good choice?

tomasJwYU commented 9 months ago

Thanks for the warm comment to AutoPrep~(^▽^)

For BSRNN you can refer to this repo https://gitlab.aicrowd.com/Tomasyu/sdx-2023-music-demixing-track-starter-kit But you need to set the hyper-parameter of the model to meet your own requirements

For gigaspeech speaker cluster results, please see https://tomasjwyu.github.io/AutoPrepDemo/

Liujingxiu23 commented 9 months ago

@tomasJwYU thanks a lot!

Liujingxiu23 commented 9 months ago

@tomasJwYU For gigaspeech, what is the cos_sim stands for? the cos_sim of current wav segment to ? I do not understand why most of the values are 1.0?