If I have a word like escape in my grammar, sometimes whisper will output the first few letters esc instead of the whole word. The expected behavior is that only the entire word should be recognized.
How to Reproduce (example 1)
Go into examples/command and make a simple single line grammar root ::= " escape". Now if you say "escape" it will sometimes print out esc instead of the whole word escape. You can also try to say "essk" and that will also print out esc but the expected behavior would be to print nothing. This is an invalid command.
How to Reproduce (example 2)
Another example is to set the grammar to root ::= " caps". If you say "cap" it will print out cap (without the s). The expected behavior should be to print nothing because cap is an invalid command, only caps (with the s) should be accepted.
My Setup
I'm running examples/command with my custom grammar on a Window 10 machine via GPU/CUDA and I get the same problem whether I use ggml-small or ggml-large-v2.
Temporary Workaround Issue
I can remove invalid words in post processing but the problem is that these erroneous words prematurely cut off recognition of any other commands which should come after. For example, if I have a long list of commands like "please escape and log out", if escape is incorrectly outputted as esc then everything that comes after that command will be omitted from the output.
Problem
If I have a word like
escape
in my grammar, sometimes whisper will output the first few lettersesc
instead of the whole word. The expected behavior is that only the entire word should be recognized.How to Reproduce (example 1)
Go into
examples/command
and make a simple single line grammarroot ::= " escape"
. Now if you say "escape" it will sometimes print outesc
instead of the whole wordescape
. You can also try to say "essk" and that will also print outesc
but the expected behavior would be to print nothing. This is an invalid command.How to Reproduce (example 2)
Another example is to set the grammar to
root ::= " caps"
. If you say "cap" it will print outcap
(without thes
). The expected behavior should be to print nothing becausecap
is an invalid command, onlycaps
(with thes
) should be accepted.My Setup
I'm running
examples/command
with my custom grammar on a Window 10 machine via GPU/CUDA and I get the same problem whether I useggml-small
orggml-large-v2
.Temporary Workaround Issue
I can remove invalid words in post processing but the problem is that these erroneous words prematurely cut off recognition of any other commands which should come after. For example, if I have a long list of commands like "please escape and log out", if
escape
is incorrectly outputted asesc
then everything that comes after that command will be omitted from the output.Notes
I noticed user @ulatekh also experienced this problem https://github.com/ggerganov/whisper.cpp/pull/2127#issuecomment-2148493982 https://github.com/ggerganov/whisper.cpp/discussions/2047#discussion-6496710. I created this issue as a response to this comment https://github.com/ggerganov/whisper.cpp/pull/2127#issuecomment-2154363819.