Closed ouuan closed 4 years ago
You can solve this issue by proposing a PR. Just google a new Regular expression that matches any number including HEX and others proposed above. You need to change this regular expression to the one that matches all.
The reason why it doesn't work now is because the old regex doesn't matches for the above mentioned numbers. So changing the new regular expression will solve it. Be cautious anything that matchs this regex will be treated a number literal. So make sure it doesn't matches for any falses.
I tried Google but found nothing useful, so I wrote one:
\b(((([0-9]+|[0-9][0-9']*[0-9])[eE][+-]?([0-9]+|[0-9][0-9']*[0-9])|([0-9]+|[0-9][0-9']*[0-9])\.([eE][+-]?([0-9]+|[0-9][0-9']*[0-9]))?|([0-9]+|[0-9][0-9']*[0-9])?\.([0-9]+|[0-9][0-9']*[0-9])([eE][+-]?([0-9]+|[0-9][0-9']*[0-9]))?|0[xX](([0-9a-fA-F]+|[0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F])\.?|([0-9a-fA-F]+|[0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F])?\.([0-9a-fA-F]+|[0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F]))[pP][+-]?([0-9]+|[0-9][0-9']*[0-9]))[fFlL]?)|(0[xX][0-9a-fA-F]+|0[xX][0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F]|[1-9][0-9]*|[1-9][0-9']*[0-9]|0[0-7]*|0[0-7']*[0-7]|0[bB][01]+|0[bB][01][01']*[01])([uU]?[lL]{0,2}|[lL]{0,2}[uU]?))\b
I think this should cover everything in https://en.cppreference.com/w/cpp/language/integer_literal and https://en.cppreference.com/w/cpp/language/floating_literal.
But there is a problem: Floating-point .
and separator '
won't be considered as part of the number due to \b
at the begin and the end. Not using \b
is a bad idea. How to solve this problem?
Also, multiple separators in a row is not handled.
UPD: even if \b
is removed, each part separated by .
or '
will be matched separately.
I decided to not support '
as a separator:
\b((([0-9]+[eE][+-]?[0-9]+|[0-9]+\.([eE][+-]?[0-9]+)?|[0-9]*\.[0-9]+([eE][+-]?[0-9]+)?|0[xX]([0-9a-fA-F]+\.?|[0-9a-fA-F]*\.[0-9a-fA-F]+)[pP][+-]?[0-9]+)[fFlL]?)|(0[xX][0-9a-fA-F]+|[1-9][0-9]*|0[0-7]*|0[bB][01]+)([uU]?[lL]{0,2}|[lL]{0,2}[uU]?))\b
But 0x1.1
will be matched although it's not a numeric literal. I don't know how to solve this problem.
.1f
can be matched if \b
is removed.
1.1f
will be matched as 1.
if there's \b
, 1.1
if \b
is removed.
(?i)
is used:
\b(?i)((([0-9]+e[+-]?[0-9]+|[0-9]+\.(e[+-]?[0-9]+)?|[0-9]*\.[0-9]+(e[+-]?[0-9]+)?|0x([0-9a-f]+\.?|[0-9a-f]*\.[0-9a-f]+)p[+-]?[0-9]+)[fl]?)|(0x[0-9a-f]+|[1-9][0-9]*|0[0-7]*|0b[01]+)(u?l{0,2}|l{0,2}u?))\b
digit-sequence: (?:\d+(?:'\d+)*)
exponent: (?:e[+-]?(?:\d+(?:'\d+)*))
hexds: (?:[0-9a-f]+(?:'[0-9a-f]+)*)
hexexp: (?:p[+-]?(?:\d+(?:'\d+)*))
floating:
1. (?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))
2. (?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)
3. (?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)
4. (?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*)))
5. (?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))
suffix: [lf]?
all: (?:(?:(?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*))))[lf]?)
integer:
1. (?:[1-9]\d*(?:'\d+)*)
2. (?:0[0-7]*(?:'[0-7]+)*)
3. (?:0x[0-9a-f]+(?:'[0-9a-f]+)*)
4. (?:0b[01]+(?:'[01]+)*)
suffix: (?:u?l{0,2}|l{0,2}u?)
all: (?:(?:(?:[1-9]\d*(?:'\d+)*)|(?:0[0-7]*(?:'[0-7]+)*)|(?:0x[0-9a-f]+(?:'[0-9a-f]+)*)|(?:0b[01]+(?:'[01]+)*))(?:u?l{0,2}|l{0,2}u?))
all: (?<=\b| |^)(?i)(?:(?:(?:(?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*))))[lf]?)|(?:(?:(?:[1-9]\d*(?:'\d+)*)|(?:0[0-7]*(?:'[0-7]+)*)|(?:0x[0-9a-f]+(?:'[0-9a-f]+)*)|(?:0b[01]+(?:'[01]+)*))(?:u?l{0,2}|l{0,2}u?)))(?=\b| |$)
I think this should work, and all the problems above are solved.
Yes, You need to use Raw String literals to preserve spaces and newlines. Like this : R"(..regex here..)";
Use (?<=\b| |^)
and (?=\b| |$)
instead of \b
is a work-around.
Are there any better solutions?
I don't have much knowledge of regex but it seems too big of expression! If there is any better solution, @Megaxela should confirm. If there is no other better solution, we should accept this as only solution
The expression is \b(?i)(?:(?:(?:(?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*))))[lf]?)|(?:(?:(?:[1-9]\d*(?:'\d+)*)|(?:0[0-7]*(?:'[0-7]+)*)|(?:0x[0-9a-f]+(?:'[0-9a-f]+)*)|(?:0b[01]+(?:'[01]+)*))(?:u?l{0,2}|l{0,2}u?)))\b
, others are my drafts.
Isn't there a way to reduce this expression to something shorter?
Isn't there a way to reduce this expression to something shorter?
Ignore C++14 and C++17 will make it much shorter.
Isn't there a way to reduce this expression to something shorter?
Ignore C++14 and C++17 will make it much shorter.
I think we should support C++17 as well regardless of the length of this regex. Let @Megaxela decide if he prefers to put such big regex for C++17 language highlighting.
I guess it's fine to use such big expressions. Due to complex C++ syntax it's pretty hard to write nice looking code for most of expressions.
For example,
1ll
,1l
,1u
,1ull
,1e5
are not highlighted, but they should be.The same for
.1
and1.
(the other part omitted).And hex numbers with letters:
0x3f
,0x3F
.