aarnphm / whispercpp

Pybind11 bindings for Whisper.cpp
Apache License 2.0
324 stars 63 forks source link

perf: improvements #12

Closed aarnphm closed 1 year ago

aarnphm commented 1 year ago

Describe the bug

regstuff commented 1 year ago

Hi, 0.0.7 hits 100% CPU for me and kind of gets stuck there forever it seems Here's my code:

from whispercpp import Whisper
w = Whisper.from_pretrained("tiny.en")
import time
import ffmpeg
import numpy as np
timh = time.time()
try:
    y, _ = (ffmpeg.input("combined.wav", threads=0).output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000).run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True))
except ffmpeg.Error as e:
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
print('Time 1:', time.time()-timh)
timh = time.time()
w.transcribe(arr)
print('Time 2:', time.time()-timh)

Time 1 is 95 msec I never get to Time 2 even with a 5-sec wav file. CPU stays maxed out.

aarnphm commented 1 year ago

Can you try to put the logic under the __main__ block:

from whispercpp import Whisper
import time
import ffmpeg
import numpy as np

if __name__ == "__main__":
    timh = time.time()
    w = Whisper.from_pretrained("tiny.en")
    try:
        y, _ = (ffmpeg.input("combined.wav", threads=0).output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000).run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True))
    except ffmpeg.Error as e:
        raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e

    arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
    print('Time 1:', time.time()-timh)
    timh = time.time()
    w.transcribe(arr)
    print('Time 2:', time.time()-timh)
aarnphm commented 1 year ago

performance on main should now match bindings from C++. Releasing 0.0.8 shortly.

regstuff commented 1 year ago

Thanks. Just wanted to clarify if the transcription runs multi-threaded by default, as even the tiny model seems to eat up my CPU. Also when setting num_proc > 1, how do I actually get the transcription result? Works fine for the default setting of 1.

miraclebakelaser commented 1 year ago

I'm also getting sluggish results on 0.0.8. I am using the same code as @regstuff . It takes about 25 seconds to transcribe a 1 minute 55 second wav. This is in line with the performance that I'm getting with vanilla Whisper. For reference, I have an M1 Mac. However, when using whisper.cpp in the command line, the same transcription takes 4 seconds.

aarnphm commented 1 year ago

Thanks. Just wanted to clarify if the transcription runs multi-threaded by default, as even the tiny model seems to eat up my CPU. Also when setting num_proc > 1, how do I actually get the transcription result? Works fine for the default setting of 1.

transcription is not multi-threaded, since accessing whisper_context is not thread-safe.

num_proc depends on how many CPUs you have available. You can just pass it into transcribe

I'm running on M1 atm and for me the result from the binding is comparable to the example under whisper.cpp.

@miraclebakelaser can you try building from source? Can you also share the wav file if possible? Currently, I don't have access to a large wav file.

pip install git+https://github.com/aarnphm/whispercpp.git@main

I'm wondering whether this has to do with how I currently building the wheel and -dynamic-lookup :thinking:

aarnphm commented 1 year ago

quick update on this. The binary compiled on Linux is not optimized and AVX was disabled, which makes it 50x slower comparing to upstream. A patch will coming out with 0.0.9 promptly.

aarnphm commented 1 year ago

Ok so releasing wheel is very hard. 0.0.12 should be out now. You can wait a bit for the wheels to be built and published. But the sdist is out.