Open Pigarian opened 1 year ago
What hardware are you running it on? What model and arguments are you passing? The large model on older hardware can definitely take some time.
probably 10 to 15 seconds or less to process and 30 to 35 seconds to load the model. It's normal,
I'm using Whisper in a custom program, so the I've had to hard code the parameters as shown above. I'm using the base language model. I'm running on an Intel i5-10300H @ 2.5 GHz so I wouldn't be surprised to learn that that's just the culprit. I run whisper_context * ctx = whisper_init_from_file("ggml-base.en.bin");
long before using whisper_full
I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.
When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector
It seems that the problem is somewhere in the encoding:
whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms
Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz
I had slow transcriptions depending which settings I was using. These settings improved it for me:
settings["LLVM_LTO"] = "YES"
settings["OTHER_CFLAGS"] = ["-O3", "-DNDEBUG"]
@23inhouse Where did you add these settings?
@shaynemei I use tuist to manage xcode settings but if you search for LLVM_LTO
and DNDEBUG
in this repo or in google you can see where they are set in the xcodeproj files. There is a way to set those in the xcode gui. You could also set them directly in the files. I'm sorry I can't remember the exact steps.
I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.
When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).
It seems that the problem is somewhere in the encoding:
whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms
Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz
@JJJJaymax Did you find a solution to your issue, I also want to use whisper.cpp in my qt 6.67 application.
Did you build with CUDA enabled?
Did you build with CUDA enabled?
No. I need to run it on a CPU instead. the computer that I want it to run on does not have GPU support. Can you please help
I put in a 20 second audio clip, most of which is silent, and it took nearly 45 seconds to process the whole thing. Am I just missing some trick to get it to run faster? These are the params I'm sending to it for reference: