Open NickHackman opened 4 years ago
Thanks for reporting. I think it should be just a matter of adding them to the verbatim string logic, but the test cases make it much easier to verify.
Man that c++ file is screwy...
# bb100123 @ VCOANSYD256197 in ~ [11:35:34]
$ gcc test.cpp
test.cpp:31:10: error: use of undeclared identifier 'R'
printf(R"(Before sorting:\n\n" )");
^
test.cpp:31:35: warning: missing terminating '"' character [-Winvalid-pp-token]
printf(R"(Before sorting:\n\n" )");
^
1 warning and 1 error generated.
# bb100123 @ VCOANSYD256197 in ~ [11:35:45] C:1
$ clang test.cpp
test.cpp:31:10: error: use of undeclared identifier 'R'
printf(R"(Before sorting:\n\n" )");
^
test.cpp:31:35: warning: missing terminating '"' character [-Winvalid-pp-token]
printf(R"(Before sorting:\n\n" )");
^
1 warning and 1 error generated.
Is there some trick to get that to work? The issue I see with it is,
printf(R"(Before sorting:\n\n" )");
Where it looks like the string ends after the \n\n as its not escaped?
@boyter Huh, the code didn't have any sort of warnings or compile time errors when I checked them
g++ (GCC) 10.1.0
clang version 10.0.0
Raw strings were added in C++11
, if that helps in anyway.
In Tokei, since we can't really cover the prefix being anything without bringing in the entire regex crate, only handles the base case in the above example. Instead of checking for a closing "
, the new closing quote needs to be )"
.
I think adding
"C++": {
"complexitychecks": [
"for ",
"for(",
"if ",
"if(",
"switch ",
"while ",
"else ",
"|| ",
"&& ",
"!= ",
"== "
],
"extensions": [
"cc",
"cpp",
"cxx",
"c++",
"pcc"
],
"line_comment": [
"//"
],
"multi_line": [
[
"/*",
"*/"
]
],
"quotes": [
{
"end": "\"",
"start": "\""
},
{
"end": ")\"",
"start": "\"(",
"ignoreEscape": true,
}
]
},
Ideally Scc
instead of interpreting the end and closing as simply as Tokei does, to use regular expressions to be able to handle the more abstract and crazy cases of C++ Raw Strings (small performance hit, but the regex is simple so most likely minimal), then storing the prefix in the state machine as the closing syntax of the string literal. This would also allow Scc
to handle the (iirc infinite) possible #
s for the Rust raw strings as well.
Maybe an additional field in quotes
stating whether the end
and start
field are regular expressions to optimize that a bit. :smile:
Yeah that would explain it. Hmmm makes it a bit harder to deal with then.
I may have to go back and redo the whole matching engine actually since its rather complex and fragile at this point. That might tie into some performance gains to be gotten from a specific state machine for the more common languages staying out of that logic where possible.
There's a similar issue with literal characters. A contrived trivial example:
#include <iostream>
int main(int argc, char *argv[])
{
std::cout << '"';
/* block comment
inside code */
}
//----------------------------------------------------------------------------
//!
//----------------------------------------------------------------------------
/*
*
*
*
*/
───────────────────────────────────────────────────────────────────────────────
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
C++ 1 17 1 0 16 0
───────────────────────────────────────────────────────────────────────────────
Total 1 17 1 0 16 0
───────────────────────────────────────────────────────────────────────────────
The example is contrived, but it's fairly common to see this sort of thing in C++ code.
Yeah there are a heap of edge cases with this. I think that as part of the performance tweaks I am considering having bespoke parser's to speedup and catch these edge cases might be the best solution.
Describe the bug Incorrect behavior when dealing with verbatim strings in languages such as C++, Python, and Rust
To Reproduce
Python:
Output:
C++:
This is the example Tokei uses. Output:
Rust:
This is the example Tokei uses. Output:
Expected behavior Python: 2 lines 1 comment 1 code 0 blanks C++: 46 lines 3 comment 37 code 6 blanks Rust: 41 lines 33 code 3 comments 5 blanks
This is the default case, as far as I know for C++ verbatim strings, because C++ verbatim strings can have prefixes.
Rust supports the pattern of
r(#+)"..."(#+)
where the capture groups must have equal length.Desktop (please complete the following information):
Love the project :heart: