Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Mozilla Public License 2.0
7.65k stars 664 forks source link

main.exe language Ukrainian outputs only ???? (this is not the colorize issue) #178

Open emcodem opened 9 months ago

emcodem commented 9 months ago

This should be very simple to reproduce, you can use any audio input, the actual language will not matter. Just set language to ukrainian: main.exe -m ggml-tiny.bin -l uk test.wav

[00:00:00.000 --> 00:00:02.000] ?, ?? ????????!

While debugging, i noticed that whisperdesktop deals a little bit different with the console output than main.exe. E.g. the desktop version does SetConsoleOutputCP(CP_UTF8) while the main.exe does not seem to do anything similar but it will still try and convert everything to utf-16. Not sure how it's intended to work in the different language environments. chcp 65001 before calling main.exe did not influence behaviour.

However, i belive i was able to sort out the issue by adding this line at start of wmain AND doing chcp 65001 before calling main.exe: std::locale::global(std::locale("en_US.UTF-8")); //this needs #include <locale>

SetConsoleOutputCP(CP_UTF8); can be additionally added after std::locale... as a replacement for manual typing chcp 65001 as it looks. So the final solution could look like:

#include <locale>
.
.
int wmain( int argc, wchar_t* argv[] )
{
std::locale::global(std::locale("en_US.UTF-8")); //force locale to make ukrainian output work
SetConsoleOutputCP(CP_UTF8); //chcp 65001
.
.
.
emcodem commented 9 months ago

This issue is not really related but worth to mention, it is for Whisper desktop only but i think they are different issues (e.g. one is related to colorize only) https://github.com/Const-me/Whisper/issues/152

Both issues, the whisperdesktop and the main.exe issue are also mentioned in: https://github.com/Const-me/Whisper/issues/23