Closed aarnphm closed 1 year ago
Hi, 0.0.7 hits 100% CPU for me and kind of gets stuck there forever it seems Here's my code:
from whispercpp import Whisper
w = Whisper.from_pretrained("tiny.en")
import time
import ffmpeg
import numpy as np
timh = time.time()
try:
y, _ = (ffmpeg.input("combined.wav", threads=0).output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000).run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True))
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
print('Time 1:', time.time()-timh)
timh = time.time()
w.transcribe(arr)
print('Time 2:', time.time()-timh)
Time 1 is 95 msec I never get to Time 2 even with a 5-sec wav file. CPU stays maxed out.
Can you try to put the logic under the __main__
block:
from whispercpp import Whisper
import time
import ffmpeg
import numpy as np
if __name__ == "__main__":
timh = time.time()
w = Whisper.from_pretrained("tiny.en")
try:
y, _ = (ffmpeg.input("combined.wav", threads=0).output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000).run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True))
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
print('Time 1:', time.time()-timh)
timh = time.time()
w.transcribe(arr)
print('Time 2:', time.time()-timh)
performance on main should now match bindings from C++. Releasing 0.0.8 shortly.
Thanks. Just wanted to clarify if the transcription runs multi-threaded by default, as even the tiny model seems to eat up my CPU. Also when setting num_proc > 1, how do I actually get the transcription result? Works fine for the default setting of 1.
I'm also getting sluggish results on 0.0.8. I am using the same code as @regstuff . It takes about 25 seconds to transcribe a 1 minute 55 second wav. This is in line with the performance that I'm getting with vanilla Whisper. For reference, I have an M1 Mac. However, when using whisper.cpp in the command line, the same transcription takes 4 seconds.
Thanks. Just wanted to clarify if the transcription runs multi-threaded by default, as even the tiny model seems to eat up my CPU. Also when setting num_proc > 1, how do I actually get the transcription result? Works fine for the default setting of 1.
transcription is not multi-threaded, since accessing whisper_context
is not thread-safe.
num_proc depends on how many CPUs you have available. You can just pass it into transcribe
I'm running on M1 atm and for me the result from the binding is comparable to the example under whisper.cpp.
@miraclebakelaser can you try building from source? Can you also share the wav file if possible? Currently, I don't have access to a large wav file.
pip install git+https://github.com/aarnphm/whispercpp.git@main
I'm wondering whether this has to do with how I currently building the wheel and -dynamic-lookup :thinking:
quick update on this. The binary compiled on Linux is not optimized and AVX was disabled, which makes it 50x slower comparing to upstream. A patch will coming out with 0.0.9 promptly.
Ok so releasing wheel is very hard. 0.0.12 should be out now. You can wait a bit for the wheels to be built and published. But the sdist is out.
Describe the bug
Currently, it doesn't do very well with big files. This has to do with some memcpy wrt how we handle new segment.
Avoid using stl conversion