aarnphm / whispercpp

Pybind11 bindings for Whisper.cpp
Apache License 2.0
317 stars 54 forks source link

Support for OpenVINO #192

Open githubnemo opened 7 months ago

githubnemo commented 7 months ago

What does this PR address?

This is a first draft for integrating OpenVINO (if so desired). Background: I was working on a audio streaming / transcription application on the raspberry pi 4 and wanted to get out as much performance as possible and OpenVINO was the way.

I am new to bazel and spent a lot of time figuring out how to get the basics working so excuse the crudity of the change. I have not yet found out how to get download resources or toggle features like OpenVINO into bazel, any help would be appreciated.

Since OpenVINO support is newer than the last support whisper.cpp version I bumped the version to a point in time that worked well for the normal and OpenVINO case alike. This brought up two deprecations (N_MEL constant was removed and two calls without context need context now).

Maybe of interest but slightly unrelated: OpenVINO is a static graph so if you are using smaller audio context sizes to increase inference speed you must create a OpenVINO graph with that audio context. You must patch (or write your own) the OpenAI whisper.load_model function to include something like this:

dims.n_audio_ctx = new_audio_context
checkpoint["model_state_dict"]['encoder.positional_embedding'] = checkpoint["model_state_dict"]['encoder.positional_embedding'][:new_audio_context]