alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.43k stars 1.04k forks source link

speaker identification #1226

Closed vinnitu closed 2 weeks ago

vinnitu commented 1 year ago

I have many audio files with human speech. I want to group it by speaker. For the test I get one long file (about 18 minutes) and get embedings for it (about 80 vectors). It meas each vector has about 14 second audio. My idea was find centroid and next step find similar centroid from other files by cosine distance with some threshold value. But before I need set value of threshold. I am trying to compare each vector with other from one file to get avarage distance. But I got very different distances. And noted that all vectors too different. Why it can be?

nshmyrev commented 2 weeks ago

Same as https://github.com/alphacep/vosk-api/issues/405