firasdib / Regex101

This repository is currently only used for issue tracking for www.regex101.com
3.21k stars 198 forks source link

.NET Hex Character Length Incorrect #2040

Closed mwagnerEE closed 1 year ago

mwagnerEE commented 1 year ago

Bug Description

When using Regex101's implementation the .NET flavor, \x uses the two hex digits after \x. I'm thinking this is because Microsoft's own Regular Expression Language - Quick Reference says: Escaped character Description
\x nn Uses hexadecimal representation to specify a character (nn consists of exactly two digits).

However, in reality when using the Regex engine in .NET, \x will accept 1 to 4 digits.

Reproduction steps

Enter the patterns:

  1. [\x0600-\x9900]
  2. [\x060-\x9900]
  3. [\x06-\x9900]
  4. [\x0-\x9900]

with test string hi

Expected Outcome

  1. No match
  2. Match
  3. Match
  4. Match

See here: https://dotnetfiddle.net/JNlsRR

Actual Outcome

  1. Match
  2. Match
  3. Match
  4. Compiler Error

Browser

Chrome

OS

Windows 10 x64

mwagnerEE commented 1 year ago

I apologize, I was not correct. My mistake was not noticing that the c# compiler is converting the string hex character before it even gets to the Regex engine. The c# compiler accepts 1 to 4 digits after \x in strings. But if I use a literal string the results align with Regex101's implementation.