PavelTorgashov / FastColoredTextBox

Fast Colored TextBox for Syntax Highlighting. The text editor component for .NET.
Other
1.21k stars 464 forks source link

Fixed and improved C# numeric literals highlighting #150

Open HSyr opened 5 years ago

HSyr commented 5 years ago

Proposed regex:

CSharpNumberRegex = new Regex( @"\b[\d_]+[\.]?[\d_]*([eE]\-?[\d_]+)?[lLdDfF]?\b|\b0[xX][_a-fA-F\d]+\b|\b0[bB][_01]+\b",

Highlights:

HSyr commented 5 years ago

There is a proposed solution:

\b((\d+|\d[_\d]*\d)(\.(\d+|\d[_\d]*\d))?([eE][\+\-]?(\d+|\d[_\d]*\d))?[lLdDfF]?|0[xX][_a-fA-F\d]*[a-fA-F\d]|0[bB][_01]*[01])\b

image

Hexman768 commented 5 years ago

This would be an excellent addition but it is against best practices to introduce new code knowing that it has a defect. If you could resolve the issue and update this issue with the repaired code I believe you would be a prime candidate for a PR.

HSyr commented 5 years ago

I understand. For me this small glitch is OK as there are now much severe issues without this fix. Nevertheless with C# string interpolation there is no way to match the C# string properly anyway.

Hexman768 commented 5 years ago

What makes you say that? There has to be at least some way to accomplish this.

Hexman768 commented 5 years ago

On the off-chance that it really isn't possible, I would honestly say that this would be a great improvement t the existing regex. If we are going to be better off with this enhancement (i.e. less bugs) and we are guaranteed not introducing more than just this one, than I say why not? Currently this is looking like a viable, if not temporary, solution.

HSyr commented 5 years ago

I say that as I am not a deep regular expression expert and do not see the way how to use a single regex for highlighting string interpolation like VS does:

image

Do you have any idea?

Hexman768 commented 5 years ago

You may have to utilize two regex strings, one for the purpose we've already covered (which will have the issue) and one for handling the issue (i.e. making a specific case for the issue that is occuring) so you may have to create a regex pattern to recognize strings like ".123" like you've stated above.

HSyr commented 5 years ago

I am afraid that such recursively nested language elements can be best handled by the parser running above the grammar, although I am aware that regular expressions, grammars and automatons have a lot in common. Maybe someone can do it even using regex from the scratch. Not me. I would have to open old university books :-)

Hexman768 commented 5 years ago

@PavelTorgashov Do you think that this would be a viable alternative to our current c# highlighting regex?

WrongBit commented 5 years ago

I do not see the way how to use a single regex for highlighting string interpolation like VS does:

This thread is for highlighting numbers, why you're talking about strings? And I'm not sure Regex can parse interpolated strings at all - there is theoretical limitations.

codingdave commented 5 years ago

I agree with @WrongBit, the topic is for numeric literals and CSharpNumberRegex. I think it would be nice to have some C# 7 binary literals that we can compare the regex with. I have found https://csharp.christiannagel.com/2016/10/06/literals/ as a resource:

binary literals byte b1 = 0b11101010; ushort b2 = 0b1111100011110000;

digit separators ushort s1 = 0b1011_1100_1011_0011; int x1 = 0x44aa_abcd;

boolean literals bool b1 = true; bool b2 = false;

integer literals int i1 = 1; int i2 = 0xA; uint i3 = 3; long i4 = 4; ulong i5 = 5; uint i6 = 6u; ulong i7 = 7u; long i8 = 8L; ulong i9 = 9L; ulong i10 = 10ul;

real literals float r1 = 1.1F; double r2 = 2.2; decimal r3 = 3.30M;

character literals char c1 = 'c'; char c2 = '\t'; // Tab char c3 = '\x05c'; // \ char c4_unicode = '\u0066'; //

The regex proposal may work for some literals and not for others. It would be nice if you @HSyr could explain which part of the regex should match to what literal.

Hexman768 commented 5 years ago

The strings in which I'm speaking of are the regex pattern strings NOT the targeted numeric literals.

HSyr commented 5 years ago

You are right. Sorry for bringing strings into this topic. I just wanted to express, that regex is probably not powerful enough for highlighting the latest C# version language elements.

I believe the breakdown of my regex proposal for matching C# numeric literals is obvious from the picture attached to the my message from 23 May. If not I do recommend http://www.ultrapico.com/Expresso.htm.