Open benibela opened 8 years ago
All I was trying to do is to check if a single character is in some Unicode class.
TFLRE.Create('^[\p{Nd}\p{Nl}\p{No}\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}]$',[rfUTF8]);
would work as well, with the check negated at the calling site. However, it is just as slow.
Next I tried '^(\p{Nd}|\p{Nl}|\p{No}|\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo})$'
, but surprisingly that is even slower and takes 500ms here
Hm, it looks that I must optimize TFLREUnicodeCharClass.Invert or something TFLREUnicodeCharClass-related in that direction.
It's queued on my ToDo list after "Implement thread safe dynamic array class into PasMP" (which's nearly done) and "Port the Vulkan headers to Pascal" (a one-night-task for the next night)
But queue it after #10 and #22
StopWatch := TStopWatch.StartNew;
for I := 1 to 10 do begin
FLREInstance:=TFLRE.Create('^(\p{Nd}|\p{Nl}|\p{No}|\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo})$',[rfUTF8]);
FLREInstance.Destroy;
end;
WriteLn(StopWatch.Elapsed.TotalSeconds:10:2);
Delphi 64 bit Fast I7 processor, 2 secs. The generated Regular Expression (by DumpRegularExpression) is 8747 characters long! It contains 221 times \x80 28 times \x85 etc. With rfONLYFASTOPTIMIZATIONS the Regular Expressions is 31024 long (!), but the time drops to 0.6 seconds. A good example of the optimization shrinking the regular expression down by 75%.
with `TFLRE.Create('^[\p{Nd}\p{Nl}\p{No}\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}]$',[rfUTF8]); 0.6 secs.
Still a bit of an issue.
I think the first expression is equivalent to:
FLREInstance:=TFLRE.Create('^(\p{N}|\p{L})$',[rfUTF8]);
which brings time down to 0.6 secs (for 10 iterations) (same size), which I think is OK,
And matching is still quite fast.
The following expression to check a character for some unicode classes
takes 200ms on my computer.
That does not seem normal.