Closed User4martin closed 10 months ago
Added some more commits.
Depending on the outcome of Delphi tests (and/or review, if there is a matter) I can re-arrange the list of commits according to requests, before they get merged.
(?i:..)
quantifiable => This is a bugfix. In the version with the colon, the group actually matches something. So it can be quantified (just like (?:...)
No OP_OPEN for (?i:...)
improves speed... Well if there is heavy use of such groups. Usually they don't happen often. But it was also needed for the GrpIndexes removal.
Separate counters: Less mem alloc, since each type of group needs different data, and previously room for both types of data was allocated for both types of groups
Remove GrpIndexes: General clean up. Maybe (not tested) tiny speed improvement for compilation (not execution)
I dont have last Delphi. also I am on Linux. I tag users which were having Delphi, please test Martin's diff, ie download his repo and his new branch 'work2'. this is full URL to .pas file: https://raw.githubusercontent.com/User4martin/TRegExpr/work2/src/regexpr.pas
@dkounal @AmigoJack @dummzeuch @HPWi
Tested with Delpi 11.3 test\test_delphi: needs the following in tests.pas to run: const LineEnding = #10; type SizeInt = LongInt;
but it stops with the following error in line 8432: 'TRegExpr compile: unknown meta-character: \p (pos 13)'
Screen results:
DUnitX - [test_delphi.exe] - Starting Tests.
...................................E.................................................................F.................................................E.E..............................................
Tests Found : 100 Tests Ignored : 0 Tests Passed : 96 Tests Leaked : 0 Tests Failed : 1 Tests Errored : 3
Failing Tests
tests.TTestRegexpr.RunTest27 Message: Search position, expected: <1> but was: <-1> Tests With Errors tests.TTestRegexpr.TestIsFixedLength Message: TRegExpr compile: unknown meta-character: \p (pos 13)
tests.TTestRegexpr.RunTest51unicode Message: TRegExpr compile: unknown meta-character: \p (pos 1)
tests.TTestRegexpr.RunTest52unicode Message: TRegExpr compile: unknown meta-character: \p (pos 2)
I was some commits behind, so I updated and running tests before the above commit is the following:
DUnitX - [test_delphi.exe] - Starting Tests.
.....................................E.................................................................F.................................................E.E.....................................F........
Tests Found : 101 Tests Ignored : 0 Tests Passed : 96 Tests Leaked : 0 Tests Failed : 2 Tests Errored : 3
Failing Tests
tests.TTestRegexpr.RunTest27 Message: Search position, expected: <1> but was: <-1>
tests.TTestRegexpr.RunTest70
Message: Replace failed, expected:
Tests With Errors
tests.TTestRegexpr.TestIsFixedLength Message: TRegExpr compile: unknown meta-character: \p (pos 13)
tests.TTestRegexpr.RunTest51unicode Message: TRegExpr compile: unknown meta-character: \p (pos 1)
tests.TTestRegexpr.RunTest52unicode Message: TRegExpr compile: unknown meta-character: \p (pos 2)
After the above commit:
DUnitX - [test_delphi.exe] - Starting Tests.
.....................................E.................................................................F.................................................E.E..............................................
Tests Found : 101 Tests Ignored : 0 Tests Passed : 97 Tests Leaked : 0 Tests Failed : 1 Tests Errored : 3
Failing Tests
tests.TTestRegexpr.RunTest27 Message: Search position, expected: <1> but was: <-1>
Tests With Errors
tests.TTestRegexpr.TestIsFixedLength Message: TRegExpr compile: unknown meta-character: \p (pos 13)
tests.TTestRegexpr.RunTest51unicode Message: TRegExpr compile: unknown meta-character: \p (pos 1)
tests.TTestRegexpr.RunTest52unicode Message: TRegExpr compile: unknown meta-character: \p (pos 2)
'TRegExpr compile: unknown meta-character: \p (pos 13)'
occurs if RegEx engine did not have the define UnicodeRE
. it is On by default in regexpr.pas.
// ======== Define options for TRegExpr engine
{$DEFINE UnicodeRE} // Use WideChar for characters and UnicodeString/WideString for strings
it is defined. I did not change anything in source
Actually I just looked, \p
depends on
{$IFDEF FPC}
{$DEFINE FastUnicodeData} // Use arrays for UpperCase/LowerCase/IsWordChar, they take 320K more memory
{$ENDIF}
Not sure why this is FPC only?
But it was made so in Revision: 85b20bde027d0759ad93a6a745b0fc5c8c7a1b58 Author: Michael Van Canneyt Date: 19/02/2021 11:31:45
Though before that, it was not enabled by default.
It was made non-default in Revision: 861691f354ea032e90e0026a312111db4f964be4 Author: Michael Van Canneyt Date: 19/02/2021 11:21:36
@User4martin
Not sure why this is FPC only?
Because Delphi has TCharacter which does the job and it is better than our array-method.
So, i can apply the change? no errors related to it?
Some more changes:
So, i can apply the change? no errors related to it?
Yes.
The only question was if the PointerMath works.
It does, otherwise it would likely not have compiled, but if it had compiled other tests would have failed (recursion tests)
SetInputSubString
it may not end with #0. So FirstCharSet does not work.
The first commit is ready to go. Just skipping unnecessary mem alloc.
The 2nd commit 6438a0 needs some testing with Delphi.
It replaces
GrpBounds[regRecursion].GrpStart
with direct pointe access. Skipping the need to calculate the element forregRecursion
(which in 99% of cases is zero anyway).On my PC, the benchmark gains 2% // Real live apps gain less, but still.
I know Delphi should have "PointerMath". I don't know if it is the same compiler directive. And I don't know if it works 100% identical. So that needs testing.
I tried to just use an
array of PRegExprChar
instead of^PRegExprChar
. But that slowed things down. It adds refcounts, and then, if one array is resized, the other isn't...The pointers are safe. They are only used in MatchPrim. Nothing can change the underlying arrays during the run. And
ClearInternalExecData
ensures the value are valid at start. Data is still stored in the array, and any access outside MatchPrim reads it from the arrays, and not via the pointers.