ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.9k stars 3.66k forks source link

stream.exe without window #2158

Open ErcinDedeoglu opened 6 months ago

ErcinDedeoglu commented 6 months ago

Hi @ggerganov! Thank you for the amazing work here!

I have an issue with the "stream" example.

Environment: Windows 11 Make version: GNU Make 4.4.1 SDL2: SDL2-devel-2.28.5-mingw Commit: v1.6.0

I cloned the latest commit, which is currently v1.6.0: https://github.com/ggerganov/whisper.cpp/commit/08981d1bacbe494ff1c943af6c577c669a2d9f4d

Build output:

C:\Users\edede\Desktop\git\github\whisper.cpp>make stream I whisper.cpp build info: I UNAME_S: Windows_NT I UNAME_P: unknown I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 I LDFLAGS: I CC: cc (GCC) 13.2.0 I CXX: c++ (GCC) 13.2.0

c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 examples/stream/stream.cpp examples/common.cpp examples/common-ggml.cpp examples/grammar-parser.cpp examples/common-sdl.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o stream sdl2-config --cflags --libs

1. "stream" image

It exited the process directly, as expected. However, the problem is that it didn't print any message to the console.


2. "stream --capture 1 -t 4 --step 2000 --length 2000 --keep 500 -m ggml-base.en.bin" image

It appears to be the same behavior, but as expected, it didn't terminate the process. Instead, it created a child thread that is not attached to the console output. As a result, there is no output, but the process can still be seen in the screenshot.

ggerganov commented 6 months ago

Hm, not sure - this is something Windows specific.

On Mac it runs ok:

$ ▶ make -j stream

I whisper.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_METAL
I LDFLAGS:   -framework Accelerate -framework Foundation -framework Metal -framework MetalKit
I CC:       Apple clang version 15.0.0 (clang-1500.3.9.4)
I CXX:      Apple clang version 15.0.0 (clang-1500.3.9.4)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml.c -o ggml.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml-alloc.c -o ggml-alloc.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml-backend.c -o ggml-backend.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL   -c ggml-quants.c -o ggml-quants.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_METAL -c whisper.cpp -o whisper.o
cc -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL -c ggml-metal.m -o ggml-metal.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -pthread -DGGML_USE_METAL examples/stream/stream.cpp examples/common.cpp examples/common-ggml.cpp examples/grammar-parser.cpp examples/common-sdl.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o ggml-metal.o -o stream `sdl2-config --cflags --libs`  -framework Accelerate -framework Foundation -framework Metal -framework MetalKit

$ ▶ ./stream

init: found 1 capture devices:
init:    - Capture device #0: 'SRS-XB10'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Ultra
ggml_metal_init: picking default device: Apple M2 Ultra
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/ggerganov/development/github/whisper.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M2 Ultra
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 154618.82 MB
whisper_model_load:    Metal total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_init_state: kv self size  =   18.87 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   16.39 MB
whisper_init_state: compute buffer (encode) =  135.14 MB
whisper_init_state: compute buffer (cross)  =    4.78 MB
whisper_init_state: compute buffer (decode) =   96.48 MB

main: processing 48000 samples (step = 3.0 sec / len = 10.0 sec / keep = 0.2 sec), 4 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 2, no_context = 1

[Start speaking]