Closed videodanchik closed 3 years ago
LGTM
Added fastcluster
and fixed small bug with missing parentheses.
@videodanchik @fnlandini Hey guys, I was running some experiments with the modified clustering (using fastcluster) and for larger dataset I got quite a different results compared to AHC. Did you have a chance to try this on a larger dataset? My dummy test was fine, so not sure what is going on, maybe I have a bug in my implementation (I integrated the change into my own codebase). Thanks.
Hi @Jamiroquai88 We verified that we obtain the same results for Callhome, DIHARD2 and both sets of AMI with VBx and we get the very same numbers. We did not check using only AHC, is there where you find differences?
Yeah, when running only AHC there are some differences in my tests. But nothing huge, ~1% difference in DER, so it is probably fine. Thanks anyway.
Hi @Jamiroquai88, could you please share the PLDA pairwise similarity matrix where you obtain different results while doing AHC with fastcluster
and with your implementation? And, probably some examples of code where you output different speaker labels?
Sorry @videodanchik - it is gone. I did some other modifications after that and it looks like I had some other bug in the code, right now it looks fine. I mainly wanted to ask if @fnlandini tried it on some larger test sets, which he did (nice job).
This PR includes speaker inner loop elimination (excluding the last line where I got a small time performance degradation in case of vectorizing it too). Also added some code style refactoring, following PEP recommendations.