alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.97k stars 1.11k forks source link

architecture/research paper? #542

Closed lmessinger closed 3 years ago

lmessinger commented 3 years ago

Hi everyone

I wondered how we can understand the model and architecture of VOSK? is there any research paper this is based on?

many thanks Lior

nshmyrev commented 3 years ago

is there any research paper this is based on

There is no single paper describing Vosk.

Vosk/Kaldi toolkit is based on a multiyear research by Daniel Povey and students:

https://danielpovey.com/publications.html

and also research from openfst developers:

https://research.google/pubs/pub35189/

lmessinger commented 3 years ago

Great-thanks! This is impressive. If I understand correctly, It is based on WFT which kind of probabilistic state machine? Btw, is there any deep learning in the pipeline?

On Tue, May 18, 2021, 18:57 Nickolay V. Shmyrev @.***> wrote:

is there any research paper this is based on

There is no single paper describing Vosk.

Vosk/Kaldi toolkit is based on a multiyear research by Daniel Povey and students:

https://danielpovey.com/publications.html

and also research from openfst developers:

https://research.google/pubs/pub35189/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alphacep/vosk-api/issues/542#issuecomment-843291715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM5MHMI6CHX4LFMK3IGDTLTOKE5NANCNFSM45CZT3TA .

nshmyrev commented 3 years ago

Vosk is based on a common DNN-HMM architecture. Deep neural network is used for sound scoring (acoustic scoring), HMM and WFST frameworks are used for time models (language models).

lmessinger commented 3 years ago

Got it. Many thanks.

On Wed, May 19, 2021, 11:06 Nickolay V. Shmyrev @.***> wrote:

Vosk is based on a common DNN-HMM architecture. Deep neural network is used for sound scoring (acoustic scoring), HMM and WFST frameworks are used for time models (language models).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alphacep/vosk-api/issues/542#issuecomment-843853263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM5MHIRYLC2JXJAFWO7O4LTONWQHANCNFSM45CZT3TA .

nshmyrev commented 3 years ago

You are welcome. Please close the issue if you don't have other questions.

karimmhamdihabemus commented 2 years ago

hi sir, can I get your email I have an little issue that i want to ask you about ?