Pybind11 bindings for whisper.cpp
Install from source:
pip install git+https://github.com/AIWintermuteAI/whispercpp.git -vv
Alternatively, git clone the develop branch of repository and initialize all submodules:
git submodule update --init --recursive
Then build the wheel:
[!IMPORTANT] If installing on Raspberry Pi OS (Lite, might apply to other images as well), you need to install some additional packages with apt-get:
sudo apt-get install libasound2-dev python3-dev python3-pip
# Option 1: using pypa/build
python3 -m build -w
# Option 2: using bazel
./tools/bazel build //:whispercpp_wheel
Afterwards, install the wheel:
# Option 1: via pypa/build
pip install dist/*.whl
# Option 2: using bazel
pip install $(./tools/bazel info bazel-bin)/*.whl
The binding provides a Whisper
class:
from whispercpp import Whisper
w = Whisper.from_pretrained("tiny.en")
Currently, the inference API is provided via transcribe
:
w.transcribe(np.ones((1, 16000)))
You can use any of your favorite audio libraries
(ffmpeg or
librosa, or
whispercpp.api.load_wav_file
) to load audio files into a Numpy array, then
pass it to transcribe
:
import ffmpeg
import numpy as np
try:
y, _ = (
ffmpeg.input("/path/to/audio.wav", threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
.run(
cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
w.transcribe(arr)
You can also use the model transcribe_from_file
for convience:
w.transcribe_from_file("/path/to/audio.wav")
The Pybind11 bindings supports all of the features from whisper.cpp, that takes inspiration from whisper-rs
The binding can also be used via api
:
from whispercpp import api
# Binding directly fromn whisper.cpp
See DEVELOPMENT.md
Build Type | Status | Note |
---|---|---|
Linux / MacOS Wheels | ||
Unit tests |
See examples for more information.
Whisper
Whisper.from_pretrained(model_name: str) -> Whisper
Load a pre-trained model from the local cache or download and cache if
needed. Supports loading a custom ggml model from a local path passed as model_name
.
w = Whisper.from_pretrained("tiny.en")
w = Whisper.from_pretrained("/path/to/model.bin")
The model will be saved to $XDG_DATA_HOME/whispercpp
or
~/.local/share/whispercpp
if the environment variable is not set.
Whisper.transcribe(arr: NDArray[np.float32], num_proc: int = 1)
Running transcription on a given Numpy array. This calls full
from
whisper.cpp
. If num_proc
is greater than 1, it will use full_parallel
instead.
w.transcribe(np.ones((1, 16000)))
To transcribe from a WAV file use transcribe_from_file
:
w.transcribe_from_file("/path/to/audio.wav")
Whisper.stream_transcribe(*, length_ms: int=..., device_id: int=..., num_proc: int=...) -> Iterator[str]
[EXPERIMENTAL] Streaming transcription. This calls stream_
from
whisper.cpp
. The transcription will be yielded as soon as it's available.
See stream.py for an example.
Note: The
device_id
is the index of the audio device. You can usewhispercpp.api.available_audio_devices
to get the list of available audio devices.
api
api
is a direct binding from whisper.cpp
, that has similar API to
whisper-rs
.
api.Context
This class is a wrapper around whisper_context
from whispercpp import api
ctx = api.Context.from_file("/path/to/saved_weight.bin")
Note: The context can also be accessed from the
Whisper
class viaw.context
api.Params
This class is a wrapper around whisper_params
from whispercpp import api
params = api.Params()
Note: The params can also be accessed from the
Whisper
class viaw.params
whispercpp.py. There are a few key differences here:
whispercpp
. The difference is whispercpp
use Pybind11
instead. Feel free to use it if you prefer Cython over Pybind11. Note that
whispercpp.py
and whispercpp
are mutually exclusive, as they also use
the whispercpp
namespace.whispercpp
provides similar APIs as
whisper-rs
, which provides a
nicer UX to work with. There are literally two APIs (from_pretrained
and
transcribe
) to quickly use whisper.cpp in Python.whispercpp
doesn't pollute your $HOME
directory, rather it follows the
XDG Base Directory Specification
for saved weights.Using cdll
and ctypes
and be done with it?