bug: Runs Exclusively on CPU

Describe the bug

This binding is about 10 times slower than native Whisper CPP because it is running exclusively on CPU on my M2 Device. Whisper CPP runs fine on its own on the GPU, so there is no reason why this should not be possible for Python bindings.

To reproduce

I ran this code:

from whispercpp import Whisper

w = Whisper.from_pretrained("large")
transcript = w.transcribe_from_file("output.wav")

I compared with whisper cpp command: ./main -f output.wav -m models/ggml-large.bin -otxt

Expected behavior

Run on GPU and 10x faster

Environment

python 3.11 MacOS Sonoma M2

aarnphm / whispercpp

bug: Runs Exclusively on CPU #173

Describe the bug

To reproduce

Expected behavior

Environment