alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.57k stars 1.06k forks source link

Can the VOSK grammar file be used to exclude words? #1340

Open ideasman42 opened 1 year ago

ideasman42 commented 1 year ago

Based on this issue of NSFW words being included: https://github.com/ideasman42/nerd-dictation/issues/99 it would be useful to know if the VOSK grammar file can be made to exclude words (instead of limiting them).

Is this currently supported? I only found the documentation for the VOSK grammar file in the source header which is not very detailed.

nshmyrev commented 1 year ago

You can exclude such words in postprocessing step, not need for grammar.

nshmyrev commented 1 year ago

Same as https://github.com/alphacep/vosk-api/issues/623 I suppose

ideasman42 commented 1 year ago

The problem of excluding words as a post-process is it doesn't account for the model accidentally mistaking words for profanity, where another similar sounding word should be used instead of simply ignoring it.

This is useful outside of handling profanity, there are some words VOSK sometimes think's I'm saying - words I virtually never use (at least not in the context of dictation). So it would be handy to let VOSK know never to select those words.

svenha commented 1 year ago

@ideasman42 I am using a large negative dictionary for an "update package" from https://alphacephei.com/vosk/lm#update-process With this exclusion dictionary, I reduce en.dic and en-230k-0.5.lm.gz. This adaptation needs some minutes.