abdeladim-s / pywhispercpp

Python bindings for whisper.cpp
https://abdeladim-s.github.io/pywhispercpp/
MIT License
174 stars 26 forks source link

[question] how to suppress model initiation logging #82

Open benniekiss opened 3 days ago

benniekiss commented 3 days ago

When the whisper model is loaded, it prints a lot of initialization information to the console. I'd like to be able to redirect this to a separate log file and silence the console output.

llama-cpp-python does something similar when loading a model, and I am able to redirect the output with something like:

with redirect_stderr(my_log_writer):
    ...

but I have not been successful using this with pywhispercpp. I was wondering if anyone had some insight on how to make this work.

The logs I would like to silence:

ggml-small.en-q5_1.bin: 100%
 190M/190M [00:02<00:00, 100MB/s]
whisper_init_from_file_with_params_no_state: loading model from './models/models--ggerganov--whisper.cpp/snapshots/5359861c739e955e79d9a303bcbc70fb988958b1/ggml-small.en-q5_1.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 3 (small)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:    Metal total size =   189.49 MB
whisper_model_load: model size    =  189.49 MB
whisper_backend_init_gpu: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Max
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8  (1008)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =   56.62 MB
whisper_init_state: kv cross size =   56.62 MB
whisper_init_state: kv pad  size  =    4.72 MB
whisper_init_state: compute buffer (conv)   =   22.41 MB
whisper_init_state: compute buffer (encode) =  284.68 MB
whisper_init_state: compute buffer (cross)  =    6.18 MB
whisper_init_state: compute buffer (decode) =   98.65 MB
abdeladim-s commented 2 days ago

@benniekiss, Yes whisper.cpp writes the logs stderr so I believe you can just redirect the stderr to devnull to supress them or redirect it to a file if you want. I will try to expose this as a utility function, or probably I was thinking of adding it as a parameter to the Model class, something like:

model = Model('tiny', redirect_whispercpp_logs_to=...) 

What do you think ?

abdeladim-s commented 1 day ago

Here you go, please pull the latest commit and give it a try. You can suppress the logs by setting the parameter to None

model = Model('tiny', redirect_whispercpp_logs_to=None) 
benniekiss commented 20 hours ago

Thanks for implementing this! I will give it a test in the next couple of days