Open epinay1 opened 3 years ago
Could you please tell me if the x-Vector generated by vosk is the same as a D-Vector?
No, it is different
If not, is there some way to get a D-Vector from vosk or otherwise?
You can use xvectors for uis-rnn algorithm. https://arxiv.org/pdf/1911.01266.pdf
The problem is that vosk extracts vectors per-utterance, you probably need better granularity.
Thank you for your reply, it was very helpful.
I had another question, is there an optimal length or max length of data that can be put in at once into vosk at once, for speech to text in order to get the best results? the reason I ask is there is a stdout being used to feed parts of audio at once rather than the whole audio, and I have noticed that the final text changes by changing this. If yes, then is there a way that this length can be calculated.
is there an optimal length or max length of data that can be put in at once into vosk at once, for speech to text in order to get the best results?
It is better to follow the samples in the code and feed about 0.2 seconds at once.
In the future we will make it independent but it will require API change.
I have noticed that I get better results by putting longer segments through, some words (mostly small) get omitted with shorter segments. and thus I was asking if there is a max length that can be put it, and if yes, how can I calculate that?
You can try different sizes an see. You need couple of gigabytes to process 1 hour at once I suppose.
Hi I'm trying to use vosk to get the embedding for Google's UIS-RNN. Could you please tell me if the x-Vector generated by vosk is the same as a D-Vector?
If not, is there some way to get a D-Vector from vosk or otherwise?
Thank you.