Open Yan-Xiangjun opened 1 week ago
Maybe string literals with non-ASCII characters should have u8
prefix, for example:
u8"šµš ā
abcāš šµ"
Also, build scripts should ensure that these source files are treated by compiler as encoded in UTF-8. For example, relevant GCC option is -finput-charset=UTF-8.
Alternatively, non-ASCII characters can be represented with \u
and \U
escapes, for example:
u8"\U0001F535\U0001F7E0\u2705abc\u274C\U0001F7E0\U0001F535"
Yet another alternative is to use \x
escapes instead of u8
prefix, for example:
"\xF0\x9F\x94\xB5\xF0\x9F\x9F\xA0\xE2\x9C\205abc\xE2\x9D\x8C\xF0\x9F\x9F\xA0\xF0\x9F\x94\xB5"
Though this is even less readable.
Edit: I changed \x85abc
to \205abc
in the last example because "abc" may be interpreted as part of hexadecimal number 85abc.
Likely not related, but I see a missing comma on the screenshot near the end of Failing strings.
Likely not related, but I see a missing comma on the screenshot near the end of Failing strings.
Nice catch. Without the comma, the last 2 literals form a single string, which will not match the grammar, so the test still passes.
What happened?
I used
cmake -B build
to generate a Visual Studio solution. After that, when compilingtest-grammar-integration.cpp
with MSVC, the error "newline in constant" occurred. Here is the position of the error: This is a detailed description of the error: https://learn.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2001?view=msvc-170&f1url=%3FappId%3DDev16IDEF1%26l%3DZH-CN%26k%3Dk(C2001)%26rd%3DtrueName and Version
The version of llama.cpp is b3325. I use Windows 11, Visual Studio 2022 17.8.2 and MSVC 19.38.33130.0.
What operating system are you seeing the problem on?
Windows
Relevant log output
No response