ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.24k stars 3.59k forks source link

Move grammar support out of examples? Unify? #1930

Open josharian opened 7 months ago

josharian commented 7 months ago

I'd like to add Go binding support for grammars (for https://github.com/ggerganov/whisper.cpp/issues/1697). It's a bit inconvenient now, because the grammar support is off in an examples directory, and adding stuff from examples to libwhisper.a feels wrong; ditto for the header file.

GBNF seems pretty well established at this point (it is in common in llama.cpp). It'd be nice to make it easier to support.

Could we promote grammar support to core whisper.h/whisper.cpp, similar to llama.cpp, where you simply provide a grammar string?

ggerganov commented 7 months ago

Yes, we should do that. I'm also thinking about moving all the grammar stuff into the ggml core library so that it becomes available everywhere. But the main problem is re-implementing the C++ stuff in C

josharian commented 7 months ago

the main problem is re-implementing the C++ stuff in C

Another option is to hide it all behind a very simple C facade, something like new/init, parse, free. (At least, that would suffice for Go bindings.)

Btw, I've noticed some crashes in the GBNF parser. I am planning to set up some fuzzing for it soon to try to shake them out.