ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.54k stars 3.62k forks source link

Very Slow Processing Time #873

Open Pigarian opened 1 year ago

Pigarian commented 1 year ago

I put in a 20 second audio clip, most of which is silent, and it took nearly 45 seconds to process the whole thing. Am I just missing some trick to get it to run faster? These are the params I'm sending to it for reference:

whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    {
        wparams.print_progress   = false;
        wparams.print_special    = false;
        wparams.print_realtime   = false;
        wparams.print_timestamps = false;
        wparams.translate        = false;
        wparams.single_segment   = false;
        wparams.max_tokens       = 32;
        wparams.language         = "en";
        wparams.n_threads        = std::max(1, std::min(8, (int32_t) std::thread::hardware_concurrency()));
        wparams.audio_ctx        = 768;
        wparams.speed_up         = false;
        wparams.temperature_inc  = wparams.temperature_inc;
        wparams.prompt_tokens    = nullptr;
        wparams.prompt_n_tokens  = 0;
    }
wesgould commented 1 year ago

What hardware are you running it on? What model and arguments are you passing? The large model on older hardware can definitely take some time.

mrfragger commented 1 year ago

probably 10 to 15 seconds or less to process and 30 to 35 seconds to load the model. It's normal,

Pigarian commented 1 year ago

I'm using Whisper in a custom program, so the I've had to hard code the parameters as shown above. I'm using the base language model. I'm running on an Intel i5-10300H @ 2.5 GHz so I wouldn't be surprised to learn that that's just the culprit. I run whisper_context * ctx = whisper_init_from_file("ggml-base.en.bin"); long before using whisper_full

JJJJaymax commented 1 year ago

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

23inhouse commented 1 year ago

I had slow transcriptions depending which settings I was using. These settings improved it for me:

        settings["LLVM_LTO"] = "YES"
        settings["OTHER_CFLAGS"] = ["-O3", "-DNDEBUG"]
shaynemei commented 1 year ago

@23inhouse Where did you add these settings?

23inhouse commented 1 year ago

@shaynemei I use tuist to manage xcode settings but if you search for LLVM_LTO and DNDEBUG in this repo or in google you can see where they are set in the xcodeproj files. There is a way to set those in the xcode gui. You could also set them directly in the files. I'm sorry I can't remember the exact steps.

kittechAI commented 5 months ago

I am experiencing a similar issue. I'm trying to integrate whisper.cpp into a Qt (6.5) GUI application and basically adapt the command example to display recognized speech in a text field.

When I use the command example, speech gets transcribed almost immediately. However, when I am using whisper.cpp in my app, the transcription result is just fine, but the processing times are about 10 times as in the example. As a difference, I use the QAudioSource class instead of SDL and convert the QByteArray to a std::vector (I don't if that might cause the issue).

It seems that the problem is somewhere in the encoding:

whisper_print_timings: load time = 109.72 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 93.35 ms whisper_print_timings: sample time = 32.85 ms / 35 runs ( 0.94 ms per run) whisper_print_timings: encode time = 8134.68 ms / 1 runs ( 8134.68 ms per run) whisper_print_timings: decode time = 391.61 ms / 33 runs ( 11.87 ms per run) whisper_print_timings: total time = 10151.20 ms

Because the command example runs super fast, I don't think it is necessarily a hardware limitation. But for reference I am using a machine with an AMD Ryzen 5 5600X @ 3.7 GHz

@JJJJaymax Did you find a solution to your issue, I also want to use whisper.cpp in my qt 6.67 application.

ulatekh commented 5 months ago

Did you build with CUDA enabled?

kittechAI commented 5 months ago

Did you build with CUDA enabled?

No. I need to run it on a CPU instead. the computer that I want it to run on does not have GPU support. Can you please help