AIWintermuteAI / whispercpp

Pybind11 bindings for Whisper.cpp
Apache License 2.0
45 stars 9 forks source link

whispercpp

Pybind11 bindings for whisper.cpp

Quickstart

Install from source:

pip install git+https://github.com/AIWintermuteAI/whispercpp.git -vv

Alternatively, git clone the develop branch of repository and initialize all submodules:

git submodule update --init --recursive

Then build the wheel:

[!IMPORTANT] If installing on Raspberry Pi OS (Lite, might apply to other images as well), you need to install some additional packages with apt-get: sudo apt-get install libasound2-dev python3-dev python3-pip

# Option 1: using pypa/build
python3 -m build -w

# Option 2: using bazel
./tools/bazel build //:whispercpp_wheel

Afterwards, install the wheel:

# Option 1: via pypa/build
pip install dist/*.whl

# Option 2: using bazel
pip install $(./tools/bazel info bazel-bin)/*.whl

The binding provides a Whisper class:

from whispercpp import Whisper

w = Whisper.from_pretrained("tiny.en")

Currently, the inference API is provided via transcribe:

w.transcribe(np.ones((1, 16000)))

You can use any of your favorite audio libraries (ffmpeg or librosa, or whispercpp.api.load_wav_file) to load audio files into a Numpy array, then pass it to transcribe:

import ffmpeg
import numpy as np

try:
    y, _ = (
        ffmpeg.input("/path/to/audio.wav", threads=0)
        .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
        .run(
            cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
        )
    )
except ffmpeg.Error as e:
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0

w.transcribe(arr)

You can also use the model transcribe_from_file for convience:

w.transcribe_from_file("/path/to/audio.wav")

The Pybind11 bindings supports all of the features from whisper.cpp, that takes inspiration from whisper-rs

The binding can also be used via api:

from whispercpp import api

# Binding directly fromn whisper.cpp

Development

See DEVELOPMENT.md

Official Builds

Build Type Status Note
Linux / MacOS Wheels Status
Unit tests Status

Examples

See examples for more information.

You won't believe how fast it is | Raspberry Pi Speech-to-Text

APIs

Whisper

  1. Whisper.from_pretrained(model_name: str) -> Whisper

    Load a pre-trained model from the local cache or download and cache if needed. Supports loading a custom ggml model from a local path passed as model_name.

    w = Whisper.from_pretrained("tiny.en")
    w = Whisper.from_pretrained("/path/to/model.bin")

    The model will be saved to $XDG_DATA_HOME/whispercpp or ~/.local/share/whispercpp if the environment variable is not set.

  2. Whisper.transcribe(arr: NDArray[np.float32], num_proc: int = 1)

    Running transcription on a given Numpy array. This calls full from whisper.cpp. If num_proc is greater than 1, it will use full_parallel instead.

    w.transcribe(np.ones((1, 16000)))

    To transcribe from a WAV file use transcribe_from_file:

    w.transcribe_from_file("/path/to/audio.wav")
  3. Whisper.stream_transcribe(*, length_ms: int=..., device_id: int=..., num_proc: int=...) -> Iterator[str]

    [EXPERIMENTAL] Streaming transcription. This calls stream_ from whisper.cpp. The transcription will be yielded as soon as it's available. See stream.py for an example.

    Note: The device_id is the index of the audio device. You can use whispercpp.api.available_audio_devices to get the list of available audio devices.

api

api is a direct binding from whisper.cpp, that has similar API to whisper-rs.

  1. api.Context

    This class is a wrapper around whisper_context

    from whispercpp import api
    
    ctx = api.Context.from_file("/path/to/saved_weight.bin")

    Note: The context can also be accessed from the Whisper class via w.context

  2. api.Params

    This class is a wrapper around whisper_params

    from whispercpp import api
    
    params = api.Params()

    Note: The params can also be accessed from the Whisper class via w.params

Why not?