ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
33.03k stars 3.3k forks source link

Python bindings (C-style API) #9

Open ArtyomZemlyak opened 1 year ago

ArtyomZemlyak commented 1 year ago

Good day everyone! I'm thinking about bindings for Python.

So far, I'm interested in 4 functionalities:

  1. Encoder processing
  2. Decoder processing
  3. Transcription of audio (feed audio bytes, get text)
  4. 3+Times of all words (feed audio bytes, get text + times of each word). Of course, it’s too early to think about the times of words, since even for a python implementation they are still not well done.

Perhaps in the near future, I will try to take up this task. But I had no experience with python bindings. So, if there are craftsmen who can do it quickly (if it can be done quickly... 😃), that would be cool!

ArtyomZemlyak commented 1 year ago

Some work around:

Building

main: ggml.o main.o
    g++ -L ggml.o -c -fPIC main.cpp -o main.o
    g++ -L ggml.o -shared -Wl,-soname,main.so -o main.so main.o ggml.o
    g++ -pthread -o main ggml.o main.o
    ./main -h

ggml.o: ggml.c ggml.h
    gcc -O3 -mavx -mavx2 -mfma -mf16c -c -fPIC ggml.c -o ggml.o
    gcc -shared -Wl,-soname,ggml.so -o ggml.so ggml.o

main.o: main.cpp ggml.h
    g++ -pthread -O3 -std=c++11 -c main.cpp

Run main

import ctypes
import pathlib

if __name__ == "__main__":
    # Load the shared library into ctypes
    libname = pathlib.Path().absolute() / "main.so"
    whisper = ctypes.CDLL(libname)

    whisper.main.restype = None
    whisper.main.argtypes = ctypes.c_int, ctypes.POINTER(ctypes.c_char_p)

    args = (ctypes.c_char_p * 9)(
        b"-nt",
        b"--language", b"ru",
        b"-t", b"8",
        b"-m", b"../models/ggml-model-tiny.bin",
        b"-f", b"../audio/cuker1.wav"
    )
    whisper.main(len(args), args)

And its works!

ArtyomZemlyak commented 1 year ago

But with specific functions it is already more difficult:

It might be worth considering running python and c++ in different threads/processes and sharing information between them, when its needed.

ggerganov commented 1 year ago

Thank you very much for your interest in the project!

I think we first need a proper C-style wrapper of the model loading / encode and decode functionality / sampling strategies. After that we will easily create python and other language bindings. I've done similar work in my 'ggwave' project.

I agree that the encode and decode functionality should be exposed through the API as you suggested. It would give more flexibility to the users of the library/bindings.

aichr commented 1 year ago

@ArtyomZemlyak First you reinvent the pytorch functions in c, then you want python bindings around them. Isn't the end result the same as what we have in pytorch?

ggerganov commented 1 year ago

The initial API is now available on master:

https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h

The first part allows more fine-grained control over the inference and also allows the user to implement their own sampling strategy using the predicted probabilities for each token.

The second part of the API includes methods for full inference - you simply provide the audio samples and choose the sampling parameters.

Most likely the API will change with time, but this is a good starting point.

richardburleigh commented 1 year ago

This is as far as I got trying to get the API working in Python.

It loads the model successfully, but gets a segmentation fault on whisper_full.

Any ideas?

import ctypes
import pathlib

if __name__ == "__main__":
    libname = pathlib.Path().absolute() / "whisper.so"
    whisper = ctypes.CDLL(libname)
    modelpath = b"models/ggml-medium.bin"
    model = whisper.whisper_init(modelpath)
    params = whisper.whisper_full_default_params(b"WHISPER_DECODE_GREEDY")
    w = open('samples/jfk.wav', "rb").read()
    result = whisper.whisper_full(model, params, w, b"16000")
    # Segmentation fault

Edit - Got some debugging info from gdb but it didn't help much: 0x00007ffff67916c6 in log_mel_spectrogram(float const*, int, int, int, int, int, int, whisper_filters const&, whisper_mel&)

ggerganov commented 1 year ago

Here is one way to achieve this:

# build shared libwhisper.so
gcc -O3 -std=c11   -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.so

Use it from Python like this:

import ctypes
import pathlib

# this is needed to read the WAV file properly
from scipy.io import wavfile

libname     = "libwhisper.so"
fname_model = "models/ggml-tiny.en.bin"
fname_wav   = "samples/jfk.wav"

# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy",             ctypes.c_int),
        ("n_threads",            ctypes.c_int),
        ("offset_ms",            ctypes.c_int),
        ("translate",            ctypes.c_bool),
        ("no_context",           ctypes.c_bool),
        ("print_special_tokens", ctypes.c_bool),
        ("print_progress",       ctypes.c_bool),
        ("print_realtime",       ctypes.c_bool),
        ("print_timestamps",     ctypes.c_bool),
        ("language",             ctypes.c_char_p),
        ("greedy",               ctypes.c_int * 1),
    ]

if __name__ == "__main__":
    # load library and model
    libname = pathlib.Path().absolute() / libname
    whisper = ctypes.CDLL(libname)

    # tell Python what are the return types of the functions
    whisper.whisper_init.restype                  = ctypes.c_void_p
    whisper.whisper_full_default_params.restype   = WhisperFullParams
    whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p

    # initialize whisper.cpp context
    ctx = whisper.whisper_init(fname_model.encode("utf-8"))

    # get default whisper parameters and adjust as needed
    params = whisper.whisper_full_default_params(0)
    params.print_realtime = True
    params.print_progress = False

    # load WAV file
    samplerate, data = wavfile.read(fname_wav)

    # convert to 32-bit float
    data = data.astype('float32')/32768.0

    # run the inference
    result = whisper.whisper_full(ctypes.c_void_p(ctx), params, data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)), len(data))
    if result != 0:
        print("Error: {}".format(result))
        exit(1)

    # print results from Python
    print("\nResults from Python:\n")
    n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
    for i in range(n_segments):
        t0  = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
        t1  = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
        txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)

        print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")

    # free the memory
    whisper.whisper_free(ctypes.c_void_p(ctx))
richardburleigh commented 1 year ago

Thank you @ggerganov - really appreciate your work!

Still getting a seg fault with your code, but I'll assume it's a me problem:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
log_mel_spectrogram (samples=<optimized out>, n_samples=<optimized out>, sample_rate=<optimized out>, fft_size=<optimized out>, fft_step=<optimized out>, n_mel=80, n_threads=<optimized out>, filters=..., mel=...) at whisper.cpp:1977
1977        mel.data.resize(mel.n_mel*mel.n_len);
(gdb) bt
#0  log_mel_spectrogram (samples=<optimized out>, n_samples=<optimized out>, sample_rate=<optimized out>, fft_size=<optimized out>, fft_step=<optimized out>, n_mel=80, n_threads=<optimized out>, filters=..., mel=...) at whisper.cpp:1977
#1  0x00007fffc28d24c7 in whisper_pcm_to_mel (ctx=0x560d7680, samples=0x7fffb3345010, n_samples=176000, n_threads=4) at whisper.cpp:2101
#2  0x00007fffc28d4113 in whisper_full (ctx=0x560d7680, params=..., samples=<optimized out>, n_samples=<optimized out>) at whisper.cpp:2316
richardburleigh commented 1 year ago

Got a segfault in the same place on an Intel 12th gen CPU and M1 Macbook with no changes to the above Python script. Anyone else tried it?

Were you using the same codebase as master @ggerganov ?

ggerganov commented 1 year ago

Yeah, the ctx pointer wasn't being passed properly. I've updated the python script above. Give it another try - I think it should work now.

pachacamac commented 1 year ago

Could you possibly make a binding to the stream program as well? Would be super cool to be able to register a callback once user speech is done and silence/non-speech is detected so the final text can be processed within python. This would allow for some really cool speech assistant like hacks.

richardburleigh commented 1 year ago

Could you possibly make a binding to the stream program as well? Would be super cool to be able to register a callback once user speech is done and silence/non-speech is detected so the final text can be processed within python. This would allow for some really cool speech assistant like hacks.

You can easily modify this script to use Whisper.cpp instead of DeepSpeech.

richardburleigh commented 1 year ago

@pachacamac I made a hacked together fork of Buzz which uses whisper.cpp

It's buggy and thrown together, but works.

Just make sure you build the shared library as libwhisper.so and put it in the project directory. There's no install package, so you'll need to run main.py directly.

Edit: I also made a simple stand-alone script using Whisper.cpp + Auditok (to detect voices)

ggerganov commented 1 year ago

Breaking changes in the C-api in last commit: e30cf831584a9b96df51849302de8bb35c5709ee

chidiwilliams commented 1 year ago

I seem to be having some trouble making a shared lib on Windows (https://github.com/ggerganov/whisper.cpp/issues/9#issuecomment-1272555209 works great on UNIX).

Using:

gcc -O3 -std=c11   -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c -o ggml.o
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ -DWHISPER_SHARED -DWHISPER_BUILD whisper.cpp ggml.o -o libwhisper.so

And calling from Python as:

whisper_cpp = ctypes.CDLL("libwhisper.so")

# Calling any one of the functions errors
whisper_cpp.whisper_init('path/to/model.bin'.encode('utf-8'))
whisper_cpp.whisper_lang_id('en'.encode('utf-8'))

I get:

Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

...

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: stack overflow

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>

Ref: https://github.com/chidiwilliams/buzz/issues/131

chidiwilliams commented 1 year ago

@ggerganov thanks for all your help so far. I seem to be having an issue with the Python binding (similar to one you posted, not Windows).

class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy",             ctypes.c_int),
        ("n_threads",            ctypes.c_int),
        ("offset_ms",            ctypes.c_int),
        ("translate",            ctypes.c_bool),
        ("no_context",           ctypes.c_bool),
        ("print_special_tokens", ctypes.c_bool),
        ("print_progress",       ctypes.c_bool),
        ("print_realtime",       ctypes.c_bool),
        ("print_timestamps",     ctypes.c_bool),
        ("language",             ctypes.c_char_p),
        ("greedy",               ctypes.c_int * 1),
    ]

model_path = 'ggml-model-whisper-tiny.bin'
audio_path = './whisper.cpp/samples/jfk.wav'
libname = './whisper.cpp/libwhisper.dylib'

whisper_cpp = ctypes.CDLL(
    str(pathlib.Path().absolute() / libname))

whisper_cpp.whisper_init.restype = ctypes.c_void_p
whisper_cpp.whisper_full_default_params.restype = WhisperFullParams
whisper_cpp.whisper_full_get_segment_text.restype = ctypes.c_char_p

ctx = whisper_cpp.whisper_init(model_path.encode('utf-8'))

params = whisper_cpp.whisper_full_default_params(0)
params.print_realtime = True
params.print_progress = True

samplerate, audio = wavfile.read(audio_path)
audio = audio.astype('float32')/32768.0

result = whisper_cpp.whisper_full(
    ctypes.c_void_p(ctx), params, audio.ctypes.data_as(
        ctypes.POINTER(ctypes.c_float)), len(audio))
if result != 0:
    raise Exception(f'Error from whisper.cpp: {result}')

n_segments = whisper_cpp.whisper_full_n_segments(
    ctypes.c_void_p(ctx))
print(f'n_segments: {n_segments}')

Prints:

whisper_model_load: loading model from 'ggml-model-whisper-tiny.bin'
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: mem_required  = 476.00 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: ggml ctx size =  73.58 MB
whisper_model_load: memory size =    11.41 MB
whisper_model_load: model size  =    73.54 MB
176000, length of samples
log_mel_spectrogram: n_samples = 176000, n_len = 1100
log_mel_spectrogram: recording length: 11.000000 s
length of spectrogram is less than 1s
n_segments: 0

I added an extra log line to show that whisper_full exits due to the length of the spectrogram being less than 1. I see the same issue with other audio files I try as well as when I read the audio sample using whisper.audio.load_audio

ggerganov commented 1 year ago

The WhisperFullParams struct has been updated since I posted, so you have to match the new struct in the whisper.h. Ideally, the python bindings should be automatically generated based on the C API in order to avoid this kind of issues.

chidiwilliams commented 1 year ago

Of course. Thanks a lot!

thakurudit commented 1 year ago

Of course. Thanks a lot!

@chidiwilliams Did it work for you?

chidiwilliams commented 1 year ago

@thakurudit Yes, it did. I use ctypesgen to generate bindings for Buzz.

thakurudit commented 1 year ago

@thakurudit Yes, it did. I use ctypesgen to generate bindings for Buzz.

So, In my case also transcription is not working.

whisper_model_load: loading model from './models/ggml-tiny.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: mem_required  =  390.00 MB
whisper_model_load: ggml ctx size =   73.58 MB
whisper_model_load: memory size   =   11.41 MB
whisper_model_load: model size    =   73.54 MB
16000
0

Results from Python:

If I were to make this work for whisper.cpp project what changes should I do to make it run for my local setup (MacOS). Sorry, I'm not that familiar with C++ but it would be great if you can give some sort of direction for the fix.

chidiwilliams commented 1 year ago

@thakurudit Could you share the build/run command you're using to generate this?

thakurudit commented 1 year ago

@thakurudit Could you share the build/run command you're using to generate this?

Okay, I figured out how to create python bindings but the thing is that the code shared above by @ggerganov only works for the commit id: 4d985395277dfc06b68ee8cf604a759b05b26557 not for the latest commit in the master branch. I tried to replicate the structure of WhisperFullParams inside whisper.h but then it just loads the model in the memory but transcription result is empty.

chidiwilliams commented 1 year ago

@thakurudit Yes, that's because the struct has changed since that commit. I suggest you recheck how you're building and calling the shared library. My implementation looks like this:

cmake -S whisper.cpp -B whisper.cpp/build/ $(CMAKE_FLAGS)
cmake --build whisper.cpp/build --verbose
ctypesgen ./whisper.cpp/whisper.h -llibwhisper.dylib -o whisper_cpp.py
class WhisperCpp:
    def __init__(self, model: str) -> None:
        self.ctx = whisper_cpp.whisper_init(model.encode('utf-8'))

    def transcribe(self, audio: Union[np.ndarray, str], params: Any):
        if isinstance(audio, str):
            audio = whisper.audio.load_audio(audio)

        logging.debug('Loaded audio with length = %s', len(audio))

        whisper_cpp_audio = audio.ctypes.data_as(
            ctypes.POINTER(ctypes.c_float))
        result = whisper_cpp.whisper_full(
            self.ctx, params, whisper_cpp_audio, len(audio))
        if result != 0:
            raise Exception(f'Error from whisper.cpp: {result}')

        segments: List[Segment] = []

        n_segments = whisper_cpp.whisper_full_n_segments((self.ctx))
        for i in range(n_segments):
            txt = whisper_cpp.whisper_full_get_segment_text((self.ctx), i)
            t0 = whisper_cpp.whisper_full_get_segment_t0((self.ctx), i)
            t1 = whisper_cpp.whisper_full_get_segment_t1((self.ctx), i)

            segments.append(
                Segment(start=t0*10,  # centisecond to ms
                        end=t1*10,  # centisecond to ms
                        text=txt.decode('utf-8')))

        return {
            'segments': segments,
            'text': ''.join([segment.text for segment in segments])}

    def __del__(self):
        whisper_cpp.whisper_free(self.ctx)
thakurudit commented 1 year ago

@chidiwilliams Thanks! It's working now!

stlukey commented 1 year ago

Here is a python package that uses cython to expose the API: https://github.com/o4dev/whispercpp.py

It still needs work. The API definitions are there with basic text retrieval exposed in the Whisper object. Also automatically grabs the models if needed and supports different audio formats similar to the original whisper. Setup.py should also grab whisper.cpp and automatically compile on install too.

ggerganov commented 1 year ago

@o4dev This is what I imagined for a Python wrapper initially. Given @chidiwilliams solution that uses ctypesgen I wonder which approach is better. What are the pros and cons of each?

P.S. You might want to download the ggml models from the hugging face repo - I cannot guarantee how long I will host them on my server.

stlukey commented 1 year ago

@ggerganov In my opinion cython provides significant benefits over ctypes. ctypes is nice to do simple things and to quickly get something running. However once you need to start throwing callbacks around or maintaining a large project ctypes becomes very messy very quickly.

Not to mention the performance overhead of ctypes. With cython its just native code. Pythonic native code. Many big libraries chose cython as there method of module extension to improve the performance of bottlenecks in python, instead of dealing with the verbosity of the C Python API. It directly uses header files at compile time to expose library externals. ctypes loads everything at runtime.

cython under the hood transpiles the module into c, keeping defined native functions almost exactly the same and wraps python extension specific code around specific functions to allow interfacing with the python interpreter. Previously the most thorough way to expose native code to C would have been to wrap it in and compile an extension module coded in C. cython eliminates most of the need to do things this way as the same can be achieved more concisely in less code.

stlukey commented 1 year ago

Also I'll make a note in an issue to change this over to hugging face. I'm unfortunately pretty busy the next couple weeks so cant guarantee when I will get around to it.

stlukey commented 1 year ago

For some reason none of these solutions seem to work now on Windows (both 10 and 11). Every time on multiple machines the Python interpreter exits and shuts down without any error during the invocation of whisper_full. Does anyone have any ideas?

So far I've used both my cython extension and the various ctypes (using updated versions of the examples provided here). I've even made sure the compiler is MSVC, and it matches the version used by Python - I am using Python 3.10. With thecython extension I've tried linking against the dynamic library, static library and also tried just object files. Compiled both using the CMake scripts and an updated Makefile.

This only happens on Windows and the exact same code works fine on MacOS and Linux. Without windows compatibility this extension becomes useless for its original purpose. If anyone could help I would greatly appreciate it.

The only clue I've got is that; one time when using a (Debug) DLL created by VS 2020 (with a more recent compiler than python3.10!) with a solution from CMake, a complaint was given about a rv==0 assert presumably this is a threading issue possibly not using pthreads however all the default CMake options were used to create the solution file. Nothing was edited. Just the latest version was built straight. pthreads should have been used. This was using the latest ctypes solution mentioned. This hasn't been able to be reproduced though.

silvacarl2 commented 1 year ago

tried out: https://github.com/ggerganov/whisper.cpp/issues/9#issuecomment-1272555209

created this: whisper-cpp-api.py from https://github.com/ggerganov/whisper.cpp/issues/9#issuecomment-1272555209

when running this, it core dumps:

make clean gcc -O3 -std=c11 -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.so

python whisper-cpp-api.py whisper_model_load: loading model from 'models/ggml-tiny.en.bin' whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 384 whisper_model_load: n_audio_head = 6 whisper_model_load: n_audio_layer = 4 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 384 whisper_model_load: n_text_head = 6 whisper_model_load: n_text_layer = 4 whisper_model_load: n_mels = 80 whisper_model_load: f16 = 1 whisper_model_load: type = 1 whisper_model_load: adding 1607 extra tokens whisper_model_load: mem_required = 390.00 MB whisper_model_load: ggml ctx size = 73.58 MB whisper_model_load: memory size = 11.41 MB whisper_model_load: model size = 73.54 MB Segmentation fault (core dumped)

any ideas?

Letorshillen commented 1 year ago

Hey, I saw that whisper_init got updated to whisper_init_from_file in this commit https://github.com/ggerganov/whisper.cpp/commit/1512545149e9463c0b478cd0203638c501b0ac29 4 days ago.

When I now run the python code from above with whisper_init I get an OSError: exception: access violation writing 0x00000000001D2964 error. With whisper_init_from_file I dont get an error but it seems like the code also crashes because nothing beneath the whisper_init_from_file line is called.

Did you also encounter this problem?

Here is the code until the new init_from_file line. Pretty much the same as the other ones above:

from scipy.io import wavfile
import pathlib
import ctypes
import os

libname = "libwhisper"
fname_model = "ggml-tiny.bin"
fname_wav = "audio.wav"

# this needs to match the C struct in whisper.h

class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy",             ctypes.c_int),
        ("n_max_text_ctx",       ctypes.c_int),
        ("n_threads",            ctypes.c_int),
        ("offset_ms",            ctypes.c_int),
        ("translate",            ctypes.c_bool),
        ("no_context",           ctypes.c_bool),
        ("print_special_tokens", ctypes.c_bool),
        ("print_progress",       ctypes.c_bool),
        ("print_realtime",       ctypes.c_bool),
        ("print_timestamps",     ctypes.c_bool),
        ("language",             ctypes.c_char_p),
        ("greedy",               ctypes.c_int * 1),
        ("beam_search",               ctypes.c_int * 3),
    ]

if __name__ == "__main__":
    # load library and model
    libname = str(pathlib.Path().absolute() / libname)
    fname_model = str(pathlib.Path().absolute() / fname_model)
    whisper = ctypes.WinDLL(libname, winmode=1)

    # tell Python what are the return types of the functions
    whisper.whisper_init_from_file.restype = ctypes.c_void_p
    whisper.whisper_full_default_params.restype = WhisperFullParams
    whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p

    # initialize whisper.cpp context
    ctx = whisper.whisper_init_from_file(fname_model)
synesthesiam commented 1 year ago

This is working for me:

import ctypes
import pathlib

# this is needed to read the WAV file properly
from scipy.io import wavfile

libname = "libwhisper.so"
fname_model = "models/ggml-tiny.en.bin"
fname_wav = "samples/jfk.wav"

# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy", ctypes.c_int),
        #
        ("n_max_text_ctx", ctypes.c_int),
        ("n_threads", ctypes.c_int),
        ("offset_ms", ctypes.c_int),
        ("duration_ms", ctypes.c_int),
        #
        ("translate", ctypes.c_bool),
        ("no_context", ctypes.c_bool),
        ("single_segment", ctypes.c_bool),
        ("print_special", ctypes.c_bool),
        ("print_progress", ctypes.c_bool),
        ("print_realtime", ctypes.c_bool),
        ("print_timestamps", ctypes.c_bool),
        #
        ("token_timestamps", ctypes.c_bool),
        ("thold_pt", ctypes.c_float),
        ("thold_ptsum", ctypes.c_float),
        ("max_len", ctypes.c_int),
        ("max_tokens", ctypes.c_int),
        #
        ("speed_up", ctypes.c_bool),
        ("audio_ctx", ctypes.c_int),
        #
        ("prompt_tokens", ctypes.c_void_p),
        ("prompt_n_tokens", ctypes.c_int),
        #
        ("language", ctypes.c_char_p),
        #
        ("suppress_blank", ctypes.c_bool),
        #
        ("temperature_inc", ctypes.c_float),
        ("entropy_thold", ctypes.c_float),
        ("logprob_thold", ctypes.c_float),
        ("no_speech_thold", ctypes.c_float),
        #
        ("greedy", ctypes.c_int * 1),
        ("beam_search", ctypes.c_int * 3),
        #
        ("new_segment_callback", ctypes.c_void_p),
        ("new_segment_callback_user_data", ctypes.c_void_p),
        #
        ("encoder_begin_callback", ctypes.c_void_p),
        ("encoder_begin_callback_user_data", ctypes.c_void_p),
    ]

if __name__ == "__main__":
    # load library and model
    libname = pathlib.Path().absolute() / libname
    whisper = ctypes.CDLL(libname)

    # tell Python what are the return types of the functions
    whisper.whisper_init_from_file.restype = ctypes.c_void_p
    whisper.whisper_full_default_params.restype = WhisperFullParams
    whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p

    # initialize whisper.cpp context
    ctx = whisper.whisper_init_from_file(fname_model.encode("utf-8"))

    # get default whisper parameters and adjust as needed
    params = whisper.whisper_full_default_params()
    params.print_realtime = True
    params.print_progress = False

    # load WAV file
    samplerate, data = wavfile.read(fname_wav)

    # convert to 32-bit float
    data = data.astype("float32") / 32768.0

    # run the inference
    result = whisper.whisper_full(
        ctypes.c_void_p(ctx),
        params,
        data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)),
        len(data),
    )
    if result != 0:
        print("Error: {}".format(result))
        exit(1)

    # print results from Python
    # print("\nResults from Python:\n")
    n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
    for i in range(n_segments):
        t0 = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
        t1 = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
        txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)

        print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")

    # free the memory
    whisper.whisper_free(ctypes.c_void_p(ctx))
Letorshillen commented 1 year ago

Hey thanks for the answer, I've managed to make it work in my WSL on Windows yesterday.

For some reason none of these solutions seem to work now on Windows (both 10 and 11). Every time on multiple machines the Python interpreter exits and shuts down without any error during the invocation of whisper_full. Does anyone have any ideas?

So far I've used both my cython extension and the various ctypes (using updated versions of the examples provided here). I've even made sure the compiler is MSVC, and it matches the version used by Python - I am using Python 3.10. With thecython extension I've tried linking against the dynamic library, static library and also tried just object files. Compiled both using the CMake scripts and an updated Makefile.

This only happens on Windows and the exact same code works fine on MacOS and Linux. Without windows compatibility this extension becomes useless for its original purpose. If anyone could help I would greatly appreciate it.

The only clue I've got is that; one time when using a (Debug) DLL created by VS 2020 (with a more recent compiler than python3.10!) with a solution from CMake, a complaint was given about a rv==0 assert presumably this is a threading issue possibly not using pthreads however all the default CMake options were used to create the solution file. Nothing was edited. Just the latest version was built straight. pthreads should have been used. This was using the latest ctypes solution mentioned. This hasn't been able to be reproduced though.

But as @o4dev said in this comment I cannot find a solution to make it work on windows :( Do you have any new input regarding this problem? Probably going to open an issue regarding that matter in the next few days.

synesthesiam commented 1 year ago

@Letorshillen I don't have much experience with Python/C++ interop on Windows, so no new input unfortunately :( I wonder if it would be possible to run this through WSL?

Letorshillen commented 1 year ago

yeah as I said on WSL its working perfectly fine ^^.

boolemancer commented 1 year ago

@Letorshillen

But as @o4dev said in this comment I cannot find a solution to make it work on windows :( Do you have any new input regarding this problem? Probably going to open an issue regarding that matter in the next few days.

If you're okay with trying out @o4dev's cython bindings, I was able to get them working with a little bit of debugging. There's a PR here if you want to try it out.

https://github.com/o4dev/whispercpp.py/pull/7

Alternatively, you could follow the steps in the README, but replace the repo in the pip install step with my branch with the fix:

pip install git+https://github.com/boolemancer/whispercpp.py@windows_fix
silvacarl2 commented 1 year ago

i have not been able to get python bindings to work yet, has anyone else been able to do so?

i get this every time when trying to pip install

Building wheels for collected packages: whispercpp Building wheel for whispercpp (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for whispercpp (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [13 lines of output] ./whisper.cpp/ggml.c: In function ‘ggml_time_ms’: ./whisper.cpp/ggml.c:269:5: warning: implicit declaration of function ‘clock_gettime’ [-Wimplicit-function-declaration] 269 | clock_gettime(CLOCK_MONOTONIC, &ts); | ^~~~~ ./whisper.cpp/ggml.c:269:19: error: ‘CLOCK_MONOTONIC’ undeclared (first use in this function) 269 | clock_gettime(CLOCK_MONOTONIC, &ts); | ^~~~~~~ ./whisper.cpp/ggml.c:269:19: note: each undeclared identifier is reported only once for each function it appears in ./whisper.cpp/ggml.c: In function ‘ggml_time_us’: ./whisper.cpp/ggml.c:275:19: error: ‘CLOCK_MONOTONIC’ undeclared (first use in this function) 275 | clock_gettime(CLOCK_MONOTONIC, &ts); | ^~~~~~~ error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for whispercpp Failed to build whispercpp ERROR: Could not build wheels for whispercpp, which is required to install pyproject.toml-based projects

zzzacwork commented 1 year ago

I was able to get it work, you will need to change some functions’ signatures. However I found the cython bindings slower than whisper’s original model. I am not sure if I did it correctly though.

On Fri, Jan 27, 2023 at 9:53 AM silvacarl2 @.***> wrote:

i have not been able to get python bindings to work yet, has anyone else been able to do so?

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/whisper.cpp/issues/9#issuecomment-1406688187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYKSSRE74RTLCIOMSUJCTYTWUPVQ3ANCNFSM6AAAAAAQ2KQKWI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

silvacarl2 commented 1 year ago

I am checking into this: There are basically two ways to call C++ from Python: using the PyBind11 C++ library to produce a Python module, or using the cytpes Python package to access a compiled shared library. Using PyBind11 we can more easily share many data types, while using ctypes is a much lower-level C-style solution.

aarnphm commented 1 year ago

A bit late to the party, but I'm wondering if anyone is working on a Pybind11 implementation? I have one locally and I can make a PR if it is desired.

nebilibrahim22 commented 1 year ago

@aarnphm I was trying to make a Pybind11 implementation a while ago but ran into an error when running the full function a few months ago. It got stuck in a loop at some point. Looking at what has been said recently it probably had to do with the fact I was working in Windows since others have faced the same issue.

If you could share what you have done that would be great!

ggerganov commented 1 year ago

@aarnphm @nebilibrahim22

If you have a repo that provides a Python wrapper I can link it from the README file to get some visibility.

PRs are also welcome - would be nice to have some basic CI workflow with it in order to more easily maintain it (see for example CI for go and ruby)

aarnphm commented 1 year ago

Here is the binding https://github.com/aarnphm/whispercpp cc @ggerganov

limdongjin commented 1 year ago

My Python Binding:

ver1. using cythonize https://github.com/limdongjin/whisper.cpp.py/tree/main/whisper.cpp.cython

ver2. using ctypes.CDLL https://github.com/limdongjin/whisper.cpp.py/tree/main/whisper_cpp_cdll

mrmachine commented 1 year ago

Here is the binding https://github.com/aarnphm/whispercpp cc @ggerganov

How can I make this work? I've cloned this whisper.cpp repo and run make main and make stream. I've made a virtualenv and installed whispercpp. When I try to run the stream.py example, I get:

Traceback (most recent call last):
  File "stream.py", line 44, in <module>
    default=w.api.SAMPLE_RATE,
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 144, in __getattr__
    self._module = self._load()
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 122, in _load
    module = importlib.import_module(self.__name__)
  File "/Users/tailee/.pyenv/versions/3.8.16/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dlopen(/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/api_cpp2py_export.so, 0x0002): symbol not found in flat namespace '_PyCMethod_New'

Do I need to make and install some shared libraries somewhere? If so, I could not find any instructions for this in this thread or the whisper.cpp or whispercpp docs.

aarnphm commented 1 year ago

Here is the binding aarnphm/whispercpp cc @ggerganov

How can I make this work? I've cloned this whisper.cpp repo and run make main and make stream. I've made a virtualenv and installed whispercpp. When I try to run the stream.py example, I get:

Traceback (most recent call last):
  File "stream.py", line 44, in <module>
    default=w.api.SAMPLE_RATE,
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 144, in __getattr__
    self._module = self._load()
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 122, in _load
    module = importlib.import_module(self.__name__)
  File "/Users/tailee/.pyenv/versions/3.8.16/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dlopen(/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/api_cpp2py_export.so, 0x0002): symbol not found in flat namespace '_PyCMethod_New'

Do I need to make and install some shared libraries somewhere? If so, I could not find any instructions for this in this thread or the whisper.cpp or whispercpp docs.

Hey there, let's bring this to the main repo to avoid polluting this thread.

janhuenermann commented 1 year ago

Hey everyone, I also created simple Python bindings using pybind11. In case anyone is interested, you can install them:

pip install git+https://github.com/janhuenermann/whisper.cpp.git@pybind#subdirectory=bindings/python 

To transcribe audio, run:

import pywhisper
pywhisper.init(model_path="./models/ggml-base.en.bin")
audio_pcmf32_16khz_numpy = ...
transcription = pywhisper.transcribe(audio_pcmf32_16khz_numpy)
print(transcription)
# [(0.0, 11.0, ' And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.')]

For more details, I have a readme and simple example here: https://github.com/janhuenermann/whisper.cpp/tree/pybind/bindings/python

If this is of interest to more people, I'm happy to open a PR.

DoodleBears commented 1 year ago

Another one using pybind11 pywhispercpp

pajowu commented 1 year ago

And another one using pybind11 i created for @transcribee: https://github.com/pajowu/whispercppy . I also published it into pypi (with wheels and everything): https://pypi.org/project/whispercppy/

This one is a combination of the bindings that @aarnphm create and build tooling taken from and heavily modified from @janhuenermann s fork. This should give you easy installation (wheels where possible and only cmake and clang/gcc as a dependency otherwise) and good bindings. It can even yield generated paragraphs async while they are transcribed, as shown in https://github.com/transcribee/transcribee/blob/aaaa373fa90024bad6e4053b469ab7352b5c503c/worker/transcribee_worker/whisper_transcribe.py#L165