ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.25k stars 9.53k forks source link

On Windows (but not on UNIX) redirecting the stdin of main to a pipe or a file results in wrong decoding of non-ASCII characters #6294

Closed enzomich closed 2 weeks ago

enzomich commented 6 months ago

For a small RAG application I have written a Python wrapper that opens Llama.cpp's main into a subprocess using subprocess.Popen() and communicates with it through two pipes (yes, I'm using the --simple-io option). Everything works fine, with an exception: if the line sent to main's stdin contains non-ASCII characters (e.g., Greek or Cyrillic or even just Latin with accents or other diacritical marks) those characters, and only those, are received as garbled text (and understood by the model with a lot of fantasy). Initially I thought that I was doing something wrong, but then I discovered exactly the same thing happens without my Python wrapper, by launching main at the command line and redirecting its stdin using a "main < file.txt" or "echo input_line | main" command:

C:\Users\enzom\AI\LlamaFeeder>echo Translate "Σήμερον ἐστὶν εὔδια ἡμέρα" | \Users\enzom\AI\llama.cpp\llama-b2391-bin-win-cublas-cu12.2.0-x64\main -m \Users\enzom\AI\llama.cpp\Models\mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --simple-io --instruct --temp 0.1
Log start
main: build = 2391 (7ab7b733)
[...]
 - If you want to submit another line, end your input with '\'.

>  The English translation of "ΣήμεÏον á¼ÏÏὶν εá½Î´Î¹Î± ἡμέÏα" is "The children are playing in the park."

>
>
>  Trans
>
> late
>
>  "
>
> Î
>
> £
> Î

Please also note the garbage in the following lines until main is killed with a Ctrl-C, as if it hadn't noticed that the pipe was closed at the other side.

On the other hand, if the instruction is entered at the console prompt everything works as expected:

C:\Users\enzom\AI\LlamaFeeder>\Users\enzom\AI\llama.cpp\llama-b2391-bin-win-cublas-cu12.2.0-x64\main -m \Users\enzom\AI\llama.cpp\Models\mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --simple-io --instruct --temp 0.1 
Log start
[...]
 - If you want to submit another line, end your input with '\'.

> Translate "Σήμερον ἐστὶν εὔδια ἡμέρα"
 The translation of "Σήμερον ἐστὶν εὔδια ἡμέρα" is "Today is a fair day."

>

Any idea about how to fix this?

Green-Sky commented 6 months ago

Are you sure the file encoding is utf8 ?

enzomich commented 6 months ago

Yes I am. In the example above the prompt came from a pipe, but here is one where the command line redirects main's stdin to a UTF-8 encoded text file ..\Texts\TranslateGreek.txt containing Translate "Σήμερον ἐστὶν εὔδια ἡμέρα" (meaning "Today is a fair day"). It was prepared with Notepad, which allows to specify the encoding, and Python agrees that the file is indeed UTF-8 encoded:

C:\Users\enzom\AI\LlamaFeeder>Python
Python 3.12.2 (tags/v3.12.2:6abddd9, Feb  6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> with open(r'..\Texts\TranslateGreek.txt', 'r', encoding='utf-8') as file:
...   print(file.read())
...
Translate "Σήμερον ἐστὶν εὔδια ἡμέρα"

>>>

However, when main's stdin is redirected to that file, the result is garbage:

C:\Users\enzom\AI\LlamaFeeder>\Users\enzom\AI\llama.cpp\llama-b2391-bin-win-cublas-cu12.2.0-x64\main -m \Users\enzom\AI\llama.cpp\Models\mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --simple-io --instruct --temp 0.1 < ..\Texts\TranslateGreek.txt
Log start
main: build = 2391 (7ab7b733)
[...]
 - If you want to submit another line, end your input with '\'.

>  The English translation of "ΣήμεÏον á¼ÏÏὶν εá½Î´Î¹Î± ἡμέÏα" is "The children are playing football."

>
>
>  Trans
> late
>  "
> Î
> £
[...] <-- Killed with Ctrl-C

llama_print_timings:        load time =    2992.67 ms
llama_print_timings:      sample time =      10.92 ms /   109 runs   (    0.10 ms per token,  9978.94 tokens per second)
llama_print_timings: prompt eval time =   11862.09 ms /    77 tokens (  154.05 ms per token,     6.49 tokens per second)
llama_print_timings:        eval time =   23289.80 ms /   109 runs   (  213.67 ms per token,     4.68 tokens per second)
llama_print_timings:       total time =   35433.63 ms /   186 tokens

Instead of "Σήμερον ἐστὶν εὔδια ἡμέρα", main reads "ΣήμεÏον á¼ÏÏὶν εá½Î´Î¹Î± ἡμέÏα". And that's what is read by Python opening the file as if it were ISO-8859-1 encoded:

C:\Users\enzom\AI\LlamaFeeder>Python
Python 3.12.2 (tags/v3.12.2:6abddd9, Feb  6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> with open(r'..\Texts\TranslateGreek.txt', 'r', encoding='iso-8859-1') as file:
...   print(file.read())
...
Translate "ΣήμεÏον á¼ÏÏὶν εá½Î´Î¹Î± ἡμέÏα"

>>>
Green-Sky commented 6 months ago

@enzomich looks like we only set the output codepage to utf8, try adding

SetConsoleCP(CP_UTF8);

next to this line: https://github.com/ggerganov/llama.cpp/blob/076b08649ecc3b0e1c0709c2a086a63eddd1bf32/common/console.cpp#L89

I am not sure how it will affect other parts like special char inputs in non-piped scenarios.

enzomich commented 6 months ago

@Green-Sky that fix didn't work, but this one did: I inserted after line 96 (just before the #else):

        if(simple_io) {
                _setmode(_fileno(stdin), _O_U8TEXT);    
        }

From what I understand (but I may be wrong, non being very familiar with Windows) SetConsoleCP(...) affects the console, but _setmode(stdin, ...) affects the stdin file descriptor also when the input is redirected away from the console to a file or a pipe -- like in this case.

So, for what I'm concerned this issue may be considered closed, if the fix is brought to the code.

misureaudio commented 5 months ago

I can confirm that the bug is present when piping to main, and that the code presented by enzomich solves the issue.

enzomich commented 5 months ago

@Green-Sky , as @misureaudio has confirmed both this issue and my fix, is there any chance of raising the status to "confirmed" and, evenctually, have the fix accepted and merged?

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

hasaranga commented 2 weeks ago

why is this still not fixed? @ggerganov

ggerganov commented 2 weeks ago

Please open a PR with the proposed fix and we'll merge it. I don't have Windows environment to test this

hasaranga commented 2 weeks ago

@enzomich can you do a PR?

enzomich commented 2 weeks ago

@enzomich can you do a PR?

I'm a bit busy in these days but I'll try.

misureaudio commented 2 weeks ago

Hi, Enzo Michelangeli proposed the following correction:

if (simple_io) { _setmode(_fileno(stdin), _O_U8TEXT); }

simply inserting the code snippet after line 96 in console.cpp

It works.

Attached, here, the corrected console.cpp

GMP

Il giorno dom 29 set 2024 alle ore 16:08 Enzo Michelangeli < @.***> ha scritto:

@enzomich https://github.com/enzomich can you do a PR?

I'm a bit busy in these days but I'll try.

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/6294#issuecomment-2381371752, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIIASFMOOFKS2M5EOW7SN6TZZACW7AVCNFSM6AAAAABFGVLSPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBRGM3TCNZVGI . You are receiving this because you were mentioned.Message ID: @.***>

include "console.h"

include

include

if defined(_WIN32)

define WIN32_LEAN_AND_MEAN

ifndef NOMINMAX

define NOMINMAX

endif

include

include

include

ifndef ENABLE_VIRTUAL_TERMINAL_PROCESSING

define ENABLE_VIRTUAL_TERMINAL_PROCESSING 0x0004

endif

else

include

include <sys/ioctl.h>

include

include

include

include

include

include

endif

define ANSI_COLOR_RED "\x1b[31m"

define ANSI_COLOR_GREEN "\x1b[32m"

define ANSI_COLOR_YELLOW "\x1b[33m"

define ANSI_COLOR_BLUE "\x1b[34m"

define ANSI_COLOR_MAGENTA "\x1b[35m"

define ANSI_COLOR_CYAN "\x1b[36m"

define ANSI_COLOR_RESET "\x1b[0m"

define ANSI_BOLD "\x1b[1m"

namespace console {

//
// Console state
//

static bool      advanced_display = false;
static bool      simple_io        = true;
static display_t current_display  = reset;

static FILE*     out              = stdout;

if defined (_WIN32)

static void*     hConsole;

else

static FILE*     tty              = nullptr;
static termios   initial_state;

endif

//
// Init and cleanup
//

void init(bool use_simple_io, bool use_advanced_display) {
    advanced_display = use_advanced_display;
    simple_io = use_simple_io;

if defined(_WIN32)

    // Windows-specific console initialization
    DWORD dwMode = 0;
    hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
    if (hConsole == INVALID_HANDLE_VALUE || !GetConsoleMode(hConsole, &dwMode)) {
        hConsole = GetStdHandle(STD_ERROR_HANDLE);
        if (hConsole != INVALID_HANDLE_VALUE && (!GetConsoleMode(hConsole, &dwMode))) {
            hConsole = nullptr;
            simple_io = true;
        }
    }
    if (hConsole) {
        // Check conditions combined to reduce nesting
        if (advanced_display && !(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING) &&
            !SetConsoleMode(hConsole, dwMode | ENABLE_VIRTUAL_TERMINAL_PROCESSING)) {
            advanced_display = false;
        }
        // Set console output codepage to UTF8
        SetConsoleOutputCP(CP_UTF8);
    }
    HANDLE hConIn = GetStdHandle(STD_INPUT_HANDLE);
    if (hConIn != INVALID_HANDLE_VALUE && GetConsoleMode(hConIn, &dwMode)) {
        // Set console input codepage to UTF16
        _setmode(_fileno(stdin), _O_WTEXT);

        // Set ICANON (ENABLE_LINE_INPUT) and ECHO (ENABLE_ECHO_INPUT)
        if (simple_io) {
            dwMode |= ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT;
        } else {
            dwMode &= ~(ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT);
        }
        if (!SetConsoleMode(hConIn, dwMode)) {
            simple_io = true;
        }
    }
    if (simple_io) {
        _setmode(_fileno(stdin), _O_U8TEXT);
    }

else

    // POSIX-specific console initialization
    if (!simple_io) {
        struct termios new_termios;
        tcgetattr(STDIN_FILENO, &initial_state);
        new_termios = initial_state;
        new_termios.c_lflag &= ~(ICANON | ECHO);
        new_termios.c_cc[VMIN] = 1;
        new_termios.c_cc[VTIME] = 0;
        tcsetattr(STDIN_FILENO, TCSANOW, &new_termios);

        tty = fopen("/dev/tty", "w+");
        if (tty != nullptr) {
            out = tty;
        }
    }

    setlocale(LC_ALL, "");

endif

}

void cleanup() {
    // Reset console display
    set_display(reset);

if !defined(_WIN32)

    // Restore settings on POSIX systems
    if (!simple_io) {
        if (tty != nullptr) {
            out = stdout;
            fclose(tty);
            tty = nullptr;
        }
        tcsetattr(STDIN_FILENO, TCSANOW, &initial_state);
    }

endif

}

//
// Display and IO
//

// Keep track of current display and only emit ANSI code if it changes
void set_display(display_t display) {
    if (advanced_display && current_display != display) {
        fflush(stdout);
        switch(display) {
            case reset:
                fprintf(out, ANSI_COLOR_RESET);
                break;
            case prompt:
                fprintf(out, ANSI_COLOR_YELLOW);
                break;
            case user_input:
                fprintf(out, ANSI_BOLD ANSI_COLOR_GREEN);
                break;
            case error:
                fprintf(out, ANSI_BOLD ANSI_COLOR_RED);
        }
        current_display = display;
        fflush(out);
    }
}

static char32_t getchar32() {

if defined(_WIN32)

    HANDLE hConsole = GetStdHandle(STD_INPUT_HANDLE);
    wchar_t high_surrogate = 0;

    while (true) {
        INPUT_RECORD record;
        DWORD count;
        if (!ReadConsoleInputW(hConsole, &record, 1, &count) || count == 0) {
            return WEOF;
        }

        if (record.EventType == KEY_EVENT && record.Event.KeyEvent.bKeyDown) {
            wchar_t wc = record.Event.KeyEvent.uChar.UnicodeChar;
            if (wc == 0) {
                continue;
            }

            if ((wc >= 0xD800) && (wc <= 0xDBFF)) { // Check if wc is a high surrogate
                high_surrogate = wc;
                continue;
            }
            if ((wc >= 0xDC00) && (wc <= 0xDFFF)) { // Check if wc is a low surrogate
                if (high_surrogate != 0) { // Check if we have a high surrogate
                    return ((high_surrogate - 0xD800) << 10) + (wc - 0xDC00) + 0x10000;
                }
            }

            high_surrogate = 0; // Reset the high surrogate
            return static_cast<char32_t>(wc);
        }
    }

else

    wchar_t wc = getwchar();
    if (static_cast<wint_t>(wc) == WEOF) {
        return WEOF;
    }

if WCHAR_MAX == 0xFFFF

    if ((wc >= 0xD800) && (wc <= 0xDBFF)) { // Check if wc is a high surrogate
        wchar_t low_surrogate = getwchar();
        if ((low_surrogate >= 0xDC00) && (low_surrogate <= 0xDFFF)) { // Check if the next wchar is a low surrogate
            return (static_cast<char32_t>(wc & 0x03FF) << 10) + (low_surrogate & 0x03FF) + 0x10000;
        }
    }
    if ((wc >= 0xD800) && (wc <= 0xDFFF)) { // Invalid surrogate pair
        return 0xFFFD; // Return the replacement character U+FFFD
    }

endif

    return static_cast<char32_t>(wc);

endif

}

static void pop_cursor() {

if defined(_WIN32)

    if (hConsole != NULL) {
        CONSOLE_SCREEN_BUFFER_INFO bufferInfo;
        GetConsoleScreenBufferInfo(hConsole, &bufferInfo);

        COORD newCursorPosition = bufferInfo.dwCursorPosition;
        if (newCursorPosition.X == 0) {
            newCursorPosition.X = bufferInfo.dwSize.X - 1;
            newCursorPosition.Y -= 1;
        } else {
            newCursorPosition.X -= 1;
        }

        SetConsoleCursorPosition(hConsole, newCursorPosition);
        return;
    }

endif

    putc('\b', out);
}

static int estimateWidth(char32_t codepoint) {

if defined(_WIN32)

    (void)codepoint;
    return 1;

else

    return wcwidth(codepoint);

endif

}

static int put_codepoint(const char* utf8_codepoint, size_t length, int expectedWidth) {

if defined(_WIN32)

    CONSOLE_SCREEN_BUFFER_INFO bufferInfo;
    if (!GetConsoleScreenBufferInfo(hConsole, &bufferInfo)) {
        // go with the default
        return expectedWidth;
    }
    COORD initialPosition = bufferInfo.dwCursorPosition;
    DWORD nNumberOfChars = length;
    WriteConsole(hConsole, utf8_codepoint, nNumberOfChars, &nNumberOfChars, NULL);

    CONSOLE_SCREEN_BUFFER_INFO newBufferInfo;
    GetConsoleScreenBufferInfo(hConsole, &newBufferInfo);

    // Figure out our real position if we're in the last column
    if (utf8_codepoint[0] != 0x09 && initialPosition.X == newBufferInfo.dwSize.X - 1) {
        DWORD nNumberOfChars;
        WriteConsole(hConsole, &" \b", 2, &nNumberOfChars, NULL);
        GetConsoleScreenBufferInfo(hConsole, &newBufferInfo);
    }

    int width = newBufferInfo.dwCursorPosition.X - initialPosition.X;
    if (width < 0) {
        width += newBufferInfo.dwSize.X;
    }
    return width;

else

    // We can trust expectedWidth if we've got one
    if (expectedWidth >= 0 || tty == nullptr) {
        fwrite(utf8_codepoint, length, 1, out);
        return expectedWidth;
    }

    fputs("\033[6n", tty); // Query cursor position
    int x1;
    int y1;
    int x2;
    int y2;
    int results = 0;
    results = fscanf(tty, "\033[%d;%dR", &y1, &x1);

    fwrite(utf8_codepoint, length, 1, tty);

    fputs("\033[6n", tty); // Query cursor position
    results += fscanf(tty, "\033[%d;%dR", &y2, &x2);

    if (results != 4) {
        return expectedWidth;
    }

    int width = x2 - x1;
    if (width < 0) {
        // Calculate the width considering text wrapping
        struct winsize w;
        ioctl(STDOUT_FILENO, TIOCGWINSZ, &w);
        width += w.ws_col;
    }
    return width;

endif

}

static void replace_last(char ch) {

if defined(_WIN32)

    pop_cursor();
    put_codepoint(&ch, 1, 1);

else

    fprintf(out, "\b%c", ch);

endif

}

static void append_utf8(char32_t ch, std::string & out) {
    if (ch <= 0x7F) {
        out.push_back(static_cast<unsigned char>(ch));
    } else if (ch <= 0x7FF) {
        out.push_back(static_cast<unsigned char>(0xC0 | ((ch >> 6) & 0x1F)));
        out.push_back(static_cast<unsigned char>(0x80 | (ch & 0x3F)));
    } else if (ch <= 0xFFFF) {
        out.push_back(static_cast<unsigned char>(0xE0 | ((ch >> 12) & 0x0F)));
        out.push_back(static_cast<unsigned char>(0x80 | ((ch >> 6) & 0x3F)));
        out.push_back(static_cast<unsigned char>(0x80 | (ch & 0x3F)));
    } else if (ch <= 0x10FFFF) {
        out.push_back(static_cast<unsigned char>(0xF0 | ((ch >> 18) & 0x07)));
        out.push_back(static_cast<unsigned char>(0x80 | ((ch >> 12) & 0x3F)));
        out.push_back(static_cast<unsigned char>(0x80 | ((ch >> 6) & 0x3F)));
        out.push_back(static_cast<unsigned char>(0x80 | (ch & 0x3F)));
    } else {
        // Invalid Unicode code point
    }
}

// Helper function to remove the last UTF-8 character from a string
static void pop_back_utf8_char(std::string & line) {
    if (line.empty()) {
        return;
    }

    size_t pos = line.length() - 1;

    // Find the start of the last UTF-8 character (checking up to 4 bytes back)
    for (size_t i = 0; i < 3 && pos > 0; ++i, --pos) {
        if ((line[pos] & 0xC0) != 0x80) {
            break; // Found the start of the character
        }
    }
    line.erase(pos);
}

static bool readline_advanced(std::string & line, bool multiline_input) {
    if (out != stdout) {
        fflush(stdout);
    }

    line.clear();
    std::vector<int> widths;
    bool is_special_char = false;
    bool end_of_stream = false;

    char32_t input_char;
    while (true) {
        fflush(out); // Ensure all output is displayed before waiting for input
        input_char = getchar32();

        if (input_char == '\r' || input_char == '\n') {
            break;
        }

        if (input_char == (char32_t) WEOF || input_char == 0x04 /* Ctrl+D*/) {
            end_of_stream = true;
            break;
        }

        if (is_special_char) {
            set_display(user_input);
            replace_last(line.back());
            is_special_char = false;
        }

        if (input_char == '\033') { // Escape sequence
            char32_t code = getchar32();
            if (code == '[' || code == 0x1B) {
                // Discard the rest of the escape sequence
                while ((code = getchar32()) != (char32_t) WEOF) {
                    if ((code >= 'A' && code <= 'Z') || (code >= 'a' && code <= 'z') || code == '~') {
                        break;
                    }
                }
            }
        } else if (input_char == 0x08 || input_char == 0x7F) { // Backspace
            if (!widths.empty()) {
                int count;
                do {
                    count = widths.back();
                    widths.pop_back();
                    // Move cursor back, print space, and move cursor back again
                    for (int i = 0; i < count; i++) {
                        replace_last(' ');
                        pop_cursor();
                    }
                    pop_back_utf8_char(line);
                } while (count == 0 && !widths.empty());
            }
        } else {
            int offset = line.length();
            append_utf8(input_char, line);
            int width = put_codepoint(line.c_str() + offset, line.length() - offset, estimateWidth(input_char));
            if (width < 0) {
                width = 0;
            }
            widths.push_back(width);
        }

        if (!line.empty() && (line.back() == '\\' || line.back() == '/')) {
            set_display(prompt);
            replace_last(line.back());
            is_special_char = true;
        }
    }

    bool has_more = multiline_input;
    if (is_special_char) {
        replace_last(' ');
        pop_cursor();

        char last = line.back();
        line.pop_back();
        if (last == '\\') {
            line += '\n';
            fputc('\n', out);
            has_more = !has_more;
        } else {
            // llama will just eat the single space, it won't act as a space
            if (line.length() == 1 && line.back() == ' ') {
                line.clear();
                pop_cursor();
            }
            has_more = false;
        }
    } else {
        if (end_of_stream) {
            has_more = false;
        } else {
            line += '\n';
            fputc('\n', out);
        }
    }

    fflush(out);
    return has_more;
}

static bool readline_simple(std::string & line, bool multiline_input) {

if defined(_WIN32)

    std::wstring wline;
    if (!std::getline(std::wcin, wline)) {
        // Input stream is bad or EOF received
        line.clear();
        GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0);
        return false;
    }

    int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wline[0], (int)wline.size(), NULL, 0, NULL, NULL);
    line.resize(size_needed);
    WideCharToMultiByte(CP_UTF8, 0, &wline[0], (int)wline.size(), &line[0], size_needed, NULL, NULL);

else

    if (!std::getline(std::cin, line)) {
        // Input stream is bad or EOF received
        line.clear();
        return false;
    }

endif

    if (!line.empty()) {
        char last = line.back();
        if (last == '/') { // Always return control on '/' symbol
            line.pop_back();
            return false;
        }
        if (last == '\\') { // '\\' changes the default action
            line.pop_back();
            multiline_input = !multiline_input;
        }
    }
    line += '\n';

    // By default, continue input if multiline_input is set
    return multiline_input;
}

bool readline(std::string & line, bool multiline_input) {
    set_display(user_input);

    if (simple_io) {
        return readline_simple(line, multiline_input);
    }
    return readline_advanced(line, multiline_input);
}

}

hasaranga commented 2 weeks ago

@ggerganov I made a PR #9690 Please review it.