Open securigy opened 1 year ago
Not yet, we are working on universal punctuation to use from other languages, but it take time.
For speaker models, you can use pretrained model, yes. They detect pitch differences and map them to xvector.
Punctuation - got it.
Speaker models - that's a shame, because I do not have pretrained model. I was hoping that there is a generic model that can detect difference in voice pitch... Making my own is beyond my knowledge and capability at this time...
Making my own is beyond my knowledge and capability at this time...
It is in downloads, see
For usage see https://github.com/alphacep/vosk-api/issues/405
Well, the model is there, but it is absolutely not clear how to recognize one person speaking from another... There are some py codes, but I have no idea still about all the numbers and comparisons needed to be made to achieve that.. So I have to drop it for now...
BTW, is there any way to delegate work to GPU? Do I need to recognize in code first that I have adequate GPU and if yes, how?
2 days were wasted. Vosk is really good at transcribing voice to text. But I think speaker recognition is not ready yet. There is neither a proper source nor an example. Everyone has written something from every angle, but it is all empty. I think it is necessary to prepare a detailed document for speaker recognition.
I've been googling and browsing all day long but cannot find how to use Vosk Punctuation models, especially in C#. Is it supported at all? If yes, any example?
I am also looking for an answer to the following question: Using Speaker Models - is it possible without training, that is, based on differences in voice pitch, and some other audio characteristics, etc.