Megaxela / QCodeEditor

Qt Code Editor widget.
MIT License
378 stars 116 forks source link

Highlighting for numbers with types in C++ #26

Closed ouuan closed 4 years ago

ouuan commented 4 years ago

For example, 1ll, 1l, 1u, 1ull, 1e5 are not highlighted, but they should be.

The same for .1 and 1. (the other part omitted).

And hex numbers with letters: 0x3f, 0x3F.

coder3101 commented 4 years ago

You can solve this issue by proposing a PR. Just google a new Regular expression that matches any number including HEX and others proposed above. You need to change this regular expression to the one that matches all.

The reason why it doesn't work now is because the old regex doesn't matches for the above mentioned numbers. So changing the new regular expression will solve it. Be cautious anything that matchs this regex will be treated a number literal. So make sure it doesn't matches for any falses.

ouuan commented 4 years ago

I tried Google but found nothing useful, so I wrote one:

\b(((([0-9]+|[0-9][0-9']*[0-9])[eE][+-]?([0-9]+|[0-9][0-9']*[0-9])|([0-9]+|[0-9][0-9']*[0-9])\.([eE][+-]?([0-9]+|[0-9][0-9']*[0-9]))?|([0-9]+|[0-9][0-9']*[0-9])?\.([0-9]+|[0-9][0-9']*[0-9])([eE][+-]?([0-9]+|[0-9][0-9']*[0-9]))?|0[xX](([0-9a-fA-F]+|[0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F])\.?|([0-9a-fA-F]+|[0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F])?\.([0-9a-fA-F]+|[0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F]))[pP][+-]?([0-9]+|[0-9][0-9']*[0-9]))[fFlL]?)|(0[xX][0-9a-fA-F]+|0[xX][0-9a-fA-F][0-9a-fA-F']*[0-9a-fA-F]|[1-9][0-9]*|[1-9][0-9']*[0-9]|0[0-7]*|0[0-7']*[0-7]|0[bB][01]+|0[bB][01][01']*[01])([uU]?[lL]{0,2}|[lL]{0,2}[uU]?))\b

I think this should cover everything in https://en.cppreference.com/w/cpp/language/integer_literal and https://en.cppreference.com/w/cpp/language/floating_literal.

But there is a problem: Floating-point . and separator ' won't be considered as part of the number due to \b at the begin and the end. Not using \b is a bad idea. How to solve this problem?

Also, multiple separators in a row is not handled.


UPD: even if \b is removed, each part separated by . or ' will be matched separately.

I decided to not support ' as a separator:

\b((([0-9]+[eE][+-]?[0-9]+|[0-9]+\.([eE][+-]?[0-9]+)?|[0-9]*\.[0-9]+([eE][+-]?[0-9]+)?|0[xX]([0-9a-fA-F]+\.?|[0-9a-fA-F]*\.[0-9a-fA-F]+)[pP][+-]?[0-9]+)[fFlL]?)|(0[xX][0-9a-fA-F]+|[1-9][0-9]*|0[0-7]*|0[bB][01]+)([uU]?[lL]{0,2}|[lL]{0,2}[uU]?))\b

But 0x1.1 will be matched although it's not a numeric literal. I don't know how to solve this problem.

.1f can be matched if \b is removed.

1.1f will be matched as 1. if there's \b, 1.1 if \b is removed.


(?i) is used:

\b(?i)((([0-9]+e[+-]?[0-9]+|[0-9]+\.(e[+-]?[0-9]+)?|[0-9]*\.[0-9]+(e[+-]?[0-9]+)?|0x([0-9a-f]+\.?|[0-9a-f]*\.[0-9a-f]+)p[+-]?[0-9]+)[fl]?)|(0x[0-9a-f]+|[1-9][0-9]*|0[0-7]*|0b[01]+)(u?l{0,2}|l{0,2}u?))\b
ouuan commented 4 years ago
digit-sequence: (?:\d+(?:'\d+)*)
exponent: (?:e[+-]?(?:\d+(?:'\d+)*))
hexds: (?:[0-9a-f]+(?:'[0-9a-f]+)*)
hexexp: (?:p[+-]?(?:\d+(?:'\d+)*))

floating:
    1. (?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))
    2. (?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)
    3. (?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)
    4. (?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*)))
    5. (?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))
    suffix: [lf]?

    all: (?:(?:(?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*))))[lf]?)

integer:
    1. (?:[1-9]\d*(?:'\d+)*)
    2. (?:0[0-7]*(?:'[0-7]+)*)
    3. (?:0x[0-9a-f]+(?:'[0-9a-f]+)*)
    4. (?:0b[01]+(?:'[01]+)*)
    suffix: (?:u?l{0,2}|l{0,2}u?)

    all: (?:(?:(?:[1-9]\d*(?:'\d+)*)|(?:0[0-7]*(?:'[0-7]+)*)|(?:0x[0-9a-f]+(?:'[0-9a-f]+)*)|(?:0b[01]+(?:'[01]+)*))(?:u?l{0,2}|l{0,2}u?))

all: (?<=\b| |^)(?i)(?:(?:(?:(?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*))))[lf]?)|(?:(?:(?:[1-9]\d*(?:'\d+)*)|(?:0[0-7]*(?:'[0-7]+)*)|(?:0x[0-9a-f]+(?:'[0-9a-f]+)*)|(?:0b[01]+(?:'[01]+)*))(?:u?l{0,2}|l{0,2}u?)))(?=\b| |$)

I think this should work, and all the problems above are solved.

coder3101 commented 4 years ago

Yes, You need to use Raw String literals to preserve spaces and newlines. Like this : R"(..regex here..)";

ouuan commented 4 years ago

Use (?<=\b| |^) and (?=\b| |$) instead of \b is a work-around.

Are there any better solutions?

coder3101 commented 4 years ago

I don't have much knowledge of regex but it seems too big of expression! If there is any better solution, @Megaxela should confirm. If there is no other better solution, we should accept this as only solution

ouuan commented 4 years ago

The expression is \b(?i)(?:(?:(?:(?:(?:\d+(?:'\d+)*)?\.(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)\.(?:e[+-]?(?:\d+(?:'\d+)*))?)|(?:(?:\d+(?:'\d+)*)(?:e[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)?\.(?:[0-9a-f]+(?:'[0-9a-f]+)*)(?:p[+-]?(?:\d+(?:'\d+)*)))|(?:0x(?:[0-9a-f]+(?:'[0-9a-f]+)*)\.?(?:p[+-]?(?:\d+(?:'\d+)*))))[lf]?)|(?:(?:(?:[1-9]\d*(?:'\d+)*)|(?:0[0-7]*(?:'[0-7]+)*)|(?:0x[0-9a-f]+(?:'[0-9a-f]+)*)|(?:0b[01]+(?:'[01]+)*))(?:u?l{0,2}|l{0,2}u?)))\b, others are my drafts.

coder3101 commented 4 years ago

Isn't there a way to reduce this expression to something shorter?

ouuan commented 4 years ago

Isn't there a way to reduce this expression to something shorter?

Ignore C++14 and C++17 will make it much shorter.

coder3101 commented 4 years ago

Isn't there a way to reduce this expression to something shorter?

Ignore C++14 and C++17 will make it much shorter.

I think we should support C++17 as well regardless of the length of this regex. Let @Megaxela decide if he prefers to put such big regex for C++17 language highlighting.

Megaxela commented 4 years ago

I guess it's fine to use such big expressions. Due to complex C++ syntax it's pretty hard to write nice looking code for most of expressions.