ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
34.69k stars 3.53k forks source link

Using hotwords to "bias" transcription (or limit the vocabulary in some way) #1979

Open pprobst opened 6 months ago

pprobst commented 6 months ago

Hello there.

I believe that a common usage of Whisper is to fine-tune a smaller model (e.g., base/small) with your data and then use it in a specific context. However, a limitation of Whisper compared to some previous ASR systems (such as Coqui STT with KenLM as a "scorer"), is that there's no way (that I know of) to use a "vocabulary" to limit what can be transcribed. For example, in a medical context, I wouldn't want "la aorta" to sometimes be recognized as "la horta".

It would be great if Whispercpp could have something to help with this issue. In particular, I thought the user could input a list of words of a specific context (in a medical context, for example, organs or diseases). Then, during the transcription, the inference could be "biased" towards the words in that list.

josharian commented 6 months ago

Check out the support for GBNF grammars, and the grammar_penalty param. That might get you on your way.

pprobst commented 6 months ago

I experimented with grammars some months ago; iirc transcription speed ended up being a huge problem since I have many, many words to limit the vocabulary. But I'll try to revisit grammars just in case. I also came across these: #235 (I pretty much have the same problem), #271 (I might also try this), and https://github.com/ggerganov/whisper.cpp/discussions/190#discussioncomment-8504735.