alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.1k stars 1.11k forks source link

Training a model that will be used only in grammar mode #1635

Open orenstuf opened 1 month ago

orenstuf commented 1 month ago

When training a model for other language that will be used only in grammar mode, do I need less data compared to models being trained for dictation? If so, can you estimate how many hours of audio do I need?

In my use case the words that will be used in the grammar and not known to me and will be entered by the user of the app. Also, the grammar should include "unk" for unrecognized words.

Thanks

nshmyrev commented 1 month ago

do I need less data compared to models being trained for dictation?

No, you still need the same amount of data.