ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
34.97k stars 3.57k forks source link

Why Chinese can't be properly show on Windows? #1873

Open AimoneAndex opened 7 months ago

AimoneAndex commented 7 months ago

OS:Windows 11 Home make:Visual Studio 2022+cmake run:Windows Powershell Issue:When I use a proper file whose speaker uses Chinese,Chinese can be identified and output properly but has been transformed into English.If I use "-l zh",it would show something really messy like the example under.But when using Ubuntu,it shows Chinese characters peoperly. example code on Windows: PS C:\Data\AIHub\AI\whisper\whisper.cpp> ./main.exe -m models/ggml-o-m-q8_0.bin ./yjr/1.wav -l zh whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-o-m-q8_0.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1024 whisper_model_load: n_audio_head = 16 whisper_model_load: n_audio_layer = 24 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1024 whisper_model_load: n_text_head = 16 whisper_model_load: n_text_layer = 24 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 7 whisper_model_load: qntvr = 2 whisper_model_load: type = 4 (medium) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 whisper_model_load: CPU total size = 823.13 MB (1 buffers) whisper_model_load: model size = 822.75 MB whisper_init_state: kv self size = 132.12 MB whisper_init_state: kv cross size = 147.46 MB whisper_init_state: compute buffer (conv) = 28.00 MB whisper_init_state: compute buffer (encode) = 187.14 MB whisper_init_state: compute buffer (cross) = 8.46 MB whisper_init_state: compute buffer (decode) = 107.98 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |

main: processing './yjr/1.wav' (197291 samples, 12.3 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = zh, task = transcribe, timestamps = 1 ...

[00:00:00.520 --> 00:00:04.600] 鎴戜篃寰堝枩娆㈣窇姝ュ拰MC,鐩墠鍦∕C涓粙缁嶈嚜宸辩殑涓栫晫瑙 [00:00:04.600 --> 00:00:12.200] 濂藉儚鏄涓€涓汉杩欎箞鐪?閱掑箷鑷癁鐢熶互鏉ュ凡缁忔湁涓€骞村崐浜?澶у杩樻病鎬庝箞瑙 佽繃浠?浠ュ悗浼氳澶у鐪嬬殑

whisper_print_timings: load time = 590.09 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 17.49 ms whisper_print_timings: sample time = 236.03 ms / 300 runs ( 0.79 ms per run) whisper_print_timings: encode time = 14099.99 ms / 1 runs (14099.99 ms per run) whisper_print_timings: decode time = 24.99 ms / 1 runs ( 24.99 ms per run) whisper_print_timings: batchd time = 4376.41 ms / 297 runs ( 14.74 ms per run) whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: total time = 19352.71 ms

bobqianic commented 7 months ago

Before running the whisper.cpp, enter chcp 65001 in CMD and press Enter.

AimoneAndex commented 7 months ago

Before running the whisper.cpp, enter chcp 65001 in CMD and press Enter.

Thank you so much from the bottom of my heart!!t works properly in CMD. However,still wrong in Windows Powershell which I use more. Anyway to solve it? Deeply grateful!

tamo commented 7 months ago

OutputEncoding may help you. https://stackoverflow.com/a/57134096