ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.64k stars 8.82k forks source link

Encounter the "newline in constant" error while compiling with MSVC #8334

Open Yan-Xiangjun opened 1 week ago

Yan-Xiangjun commented 1 week ago

What happened?

I used cmake -B build to generate a Visual Studio solution. After that, when compiling test-grammar-integration.cpp with MSVC, the error "newline in constant" occurred. Here is the position of the error: test1 This is a detailed description of the error: https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2001?view=msvc-170&f1url=%3FappId%3DDev16IDEF1%26l%3DZH-CN%26k%3Dk(C2001)%26rd%3Dtrue

Name and Version

The version of llama.cpp is b3325. I use Windows 11, Visual Studio 2022 17.8.2 and MSVC 19.38.33130.0.

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

shibe2 commented 1 week ago

Maybe string literals with non-ASCII characters should have u8 prefix, for example:

u8"šŸ”µšŸŸ āœ…abcāŒšŸŸ šŸ”µ"

Also, build scripts should ensure that these source files are treated by compiler as encoded in UTF-8. For example, relevant GCC option is -finput-charset=UTF-8.

Alternatively, non-ASCII characters can be represented with \u and \U escapes, for example:

u8"\U0001F535\U0001F7E0\u2705abc\u274C\U0001F7E0\U0001F535"

Yet another alternative is to use \x escapes instead of u8 prefix, for example:

"\xF0\x9F\x94\xB5\xF0\x9F\x9F\xA0\xE2\x9C\205abc\xE2\x9D\x8C\xF0\x9F\x9F\xA0\xF0\x9F\x94\xB5"

Though this is even less readable.

Edit: I changed \x85abc to \205abc in the last example because "abc" may be interpreted as part of hexadecimal number 85abc.

fairydreaming commented 1 week ago

Likely not related, but I see a missing comma on the screenshot near the end of Failing strings.

shibe2 commented 1 week ago

Likely not related, but I see a missing comma on the screenshot near the end of Failing strings.

Nice catch. Without the comma, the last 2 literals form a single string, which will not match the grammar, so the test still passes.