dynamicexpresso / DynamicExpresso

C# expressions interpreter
http://dynamic-expresso.azurewebsites.net/
MIT License
1.91k stars 364 forks source link

Interpreter.DetectIdentifiers can not find non-ascii identifier in expression #269

Closed littepointR closed 1 year ago

littepointR commented 1 year ago

My test code is:

var identifiersInfo    = interpreter.DetectIdentifiers("中文");
var identifiers        = identifiersInfo.Identifiers.ToArray();
var unknownIdentifiers = identifiersInfo.UnknownIdentifiers.ToArray();
var referenceTypes     = identifiersInfo.Types.ToArray();
Console.WriteLine("Chinese --> ");
Console.WriteLine($"identifiers.Count = {identifiers.Length}");
Console.WriteLine($"unknownIdentifiers.Count = {unknownIdentifiers.Length}");
Console.WriteLine($"referenceTypes.Count = {referenceTypes.Length}");

var identifiersInfo    = interpreter.DetectIdentifiers("日本語");
var identifiers        = identifiersInfo.Identifiers.ToArray();
var unknownIdentifiers = identifiersInfo.UnknownIdentifiers.ToArray();
var referenceTypes     = identifiersInfo.Types.ToArray();
Console.WriteLine("Japanese --> ");
Console.WriteLine($"identifiers.Count = {identifiers.Length}");
Console.WriteLine($"unknownIdentifiers.Count = {unknownIdentifiers.Length}");
Console.WriteLine($"referenceTypes.Count = {referenceTypes.Length}");

var identifiersInfo    = interpreter.DetectIdentifiers("русский язык");
var identifiers        = identifiersInfo.Identifiers.ToArray();
var unknownIdentifiers = identifiersInfo.UnknownIdentifiers.ToArray();
var referenceTypes     = identifiersInfo.Types.ToArray();
Console.WriteLine("Russian --> ");
Console.WriteLine($"identifiers.Count = {identifiers.Length}");
Console.WriteLine($"unknownIdentifiers.Count = {unknownIdentifiers.Length}");
Console.WriteLine($"referenceTypes.Count = {referenceTypes.Length}");

and ouput:

Chinese -->
identifiers.Count = 0
unknownIdentifiers.Count = 0
referenceTypes.Count = 0
Japanese -->
identifiers.Count = 0
unknownIdentifiers.Count = 0
referenceTypes.Count = 0
Russian -->
identifiers.Count = 0
unknownIdentifiers.Count = 0
referenceTypes.Count = 0
littepointR commented 1 year ago

It seems that because the regex Detector.IdentifiersDetectionRegex only matches ascii characters

davideicardi commented 1 year ago

Yes, probably the regex could be extended. As usual any help is appreciated ...

gaoqiangz commented 1 year ago

Any progress?

KaivnD commented 1 year ago

I need this Identifiers Detector support this too, then ChatGPT give the solution like this.

- private static readonly Regex IdentifiersDetectionRegex = new Regex(@"([^\.]|^)\b(?<id>[a-zA-Z_]\w*)\b", RegexOptions.Compiled);
+ private static readonly Regex IdentifiersDetectionRegex = new Regex(@"([^\.]|^)\b(?<id>[\p{L}_]\w*)\b", RegexOptions.Compiled);

which gives a regex ([^\.]|^)\b(?<id>[\p{L}_]\w*)\b for both acsii and non-acsii identifier, here is test from regex101 works for me.

metoule commented 1 year ago

Relevant section of the C# specifications: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/lexical-structure#643-identifiers