handle C++ raw string in a single token

asmwarrior commented 7 months ago

Hi, from the page: https://en.cppreference.com/w/cpp/language/string_literal

There are many kinds of C++ raw strings, while I see the lexer/preprocessor should handle them as a single token. While currently they are handled as separate tokens, for example:

int U = 3;
const char32_t* s7 = U"GHIJKL";

In the above cases, the "U" will be parsed as a single Token, and with the same Token id.

Thanks.

asmwarrior commented 7 months ago

It looks like gcc handle the raw string in the preprocessor, see here as a reference: 55971 – Preprocessor macros with C++11 raw string literals fail to compile

GrieferAtWork commented 7 months ago

Jup. You're right. C++ raw strings (like R"(foo)") are a feature TPP doesn't support as of right now.

The only thing that comes close is TPP_CONFIG_RAW_STRING_LITERALS, but those aren't c++ raw strings (and I don't recomend you use those instead, as they're deemon raw strings, which work a bit differently; don't forget that tpp is for C and "C-like" languages).

If it's any consolation to you, I've also been planning to add """ block string """ support (like you have in Java or Python) too for some time now, so I guess I'll just put all those c++11-style string literals onto my mental TODO list as well.

So: will be implemented eventually (but no promises as to when).

But: u"foo" isn't a "raw" string; that's a unicode string, and if that's all you want, you should define a keyword DEF_K(u) and then handle that in your programming language's token processor as case KWD_u: if (*TPPLexer_Current->l_token.t_end == '"') { /* unicode string */ }, to essentially check for a u keyword, which is immediatly followed by a double-quote. I do something similar to implement template strings (local x = 10; local y = f"value of x is {x}";) in deemon

asmwarrior commented 7 months ago

Hi, thanks for the detailed explanation.

My interest about learning some C-preprocessor code is to improve the embedded parser(To fetch some Symbols in the source files) inside the Code::Blocks. There are not much free/open source C-preprocessor in the world, though Clang can parse them, but it has too big code base, also it is slow. Some similar tools like:

universal-ctags/ctags: A maintained ctags implementation

danmar/cppcheck: static analysis of C/C++ code with its preprocessor danmar/simplecpp: C++ preprocessor

robertoraggi / cplusplus

A preprocessor is a very low level tool base to supply a token stream to the high level parsers.

GrieferAtWork / tpp

handle C++ raw string in a single token #7