Closed lmessinger closed 3 years ago
is there any research paper this is based on
There is no single paper describing Vosk.
Vosk/Kaldi toolkit is based on a multiyear research by Daniel Povey and students:
https://danielpovey.com/publications.html
and also research from openfst developers:
Great-thanks! This is impressive. If I understand correctly, It is based on WFT which kind of probabilistic state machine? Btw, is there any deep learning in the pipeline?
On Tue, May 18, 2021, 18:57 Nickolay V. Shmyrev @.***> wrote:
is there any research paper this is based on
There is no single paper describing Vosk.
Vosk/Kaldi toolkit is based on a multiyear research by Daniel Povey and students:
https://danielpovey.com/publications.html
and also research from openfst developers:
https://research.google/pubs/pub35189/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alphacep/vosk-api/issues/542#issuecomment-843291715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM5MHMI6CHX4LFMK3IGDTLTOKE5NANCNFSM45CZT3TA .
Vosk is based on a common DNN-HMM architecture. Deep neural network is used for sound scoring (acoustic scoring), HMM and WFST frameworks are used for time models (language models).
Got it. Many thanks.
On Wed, May 19, 2021, 11:06 Nickolay V. Shmyrev @.***> wrote:
Vosk is based on a common DNN-HMM architecture. Deep neural network is used for sound scoring (acoustic scoring), HMM and WFST frameworks are used for time models (language models).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alphacep/vosk-api/issues/542#issuecomment-843853263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM5MHIRYLC2JXJAFWO7O4LTONWQHANCNFSM45CZT3TA .
You are welcome. Please close the issue if you don't have other questions.
hi sir, can I get your email I have an little issue that i want to ask you about ?
Hi everyone
I wondered how we can understand the model and architecture of VOSK? is there any research paper this is based on?
many thanks Lior