alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.47k stars 1.05k forks source link

Unable to use my speaker recognition model (model-spk) #1354

Open Muhammad-Saifullah-Sani opened 1 year ago

Muhammad-Saifullah-Sani commented 1 year ago

Hi, this library is awesome, especially because it can be used offline. I'm a newbie in speaker recognition. I build my speaker recognition model using VoxCeleb v2 recipe in Kaldi and unable to use it on test_speaker.py to replace model-spk. I used default model for the speech recognition model (vosk-model-small-en-us-0.15).

Kaldi 5.5 Vosk 0.3.45

One of my model: model-spk.zip

LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=10 max-active=3000 lattice-beam=2 LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10 LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes. LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components. LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from C:\Users\IdeaPad.cache\vosk\vosk-model-small-en-us-0.15/ivector/final.ie LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from C:\Users\IdeaPad.cache\vosk\vosk-model-small-en-us-0.15/graph/HCLr.fst C:\Users\IdeaPad.cache\vosk\vosk-model-small-en-us-0.15/graph/Gr.fst LOG (VoskAPI:ReadDataFiles():model.cc:308) Loading winfo C:\Users\IdeaPad.cache\vosk\vosk-model-small-en-us-0.15/graph/phones/word_boundary.int LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes. LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components. ASSERTIONFAILED (VoskAPI:AddVec():kaldi-vector.cc:77) Assertion failed: (dim == v.dim_)

Please guide me. Thanks.

nshmyrev commented 1 year ago

You probably didn't extract mdl and used original one, you should extract the middle layer output from mdl like voxceleb scripts are doing