danmar / simplecpp

C++ preprocessor
BSD Zero Clause License
209 stars 83 forks source link

simplecpp should support Unicode in identifiers #348

Open rubenvb opened 7 months ago

rubenvb commented 7 months ago

C++ allows (most) unicode characters in identifier names: https://en.cppreference.com/w/cpp/language/identifiers

The Big Three support this, two of them already default to supporting them out of the box (Clang and MSVC just happily compile code using them, not sure about the current need for any flags in GCC).

Currently, using them results in a syntax error in cppcheck originating from https://github.com/danmar/cppcheck/blob/main/externals/simplecpp/simplecpp.cpp#L635. Our current setup has the option to treat cppcheck issues as build stoppers (to help developers actually look at and fix the issues when they are introduced). The current workaround is adding --suppress=syntaxError as described here: https://sourceforge.net/p/cppcheck/discussion/general/thread/d4463c60/#dc95 But that of course makes cppcheck bail out without any checking done in the respective file.

I'm not sure what would be needed to make this happen in the simplecpp code. Frankly I would be happy with cppcheck just assuming source code encoding is UTF-8, and storing identifiers in std::string will "work" (as far as storage and lookup in e.g. maps etc. is concerned in most cases where there aren't two unicode encoding variants of the same string involved), with then a bit of (optional) glue code when actually printing these to the output. I understand the above suggestion would be a limited solution, but it would help the greater majority of people which use UTF-8 encoded source files a lot in actually using cppcheck for code using unicode identifiers.

The "correct" implementation would mean adding full unicode/encoding support which is probably a stretch and maybe even unwanted (?) in its totality as it will pull in some form of dependency to do it right.