andgineer / TRegExpr

Regular expressions (regex), pascal.
https://regex.sorokin.engineer/en/latest/
MIT License
174 stars 63 forks source link

Small optimization / Needs testing with Delphi #378

Closed User4martin closed 10 months ago

User4martin commented 10 months ago

The first commit is ready to go. Just skipping unnecessary mem alloc.


The 2nd commit 6438a0 needs some testing with Delphi.

It replaces GrpBounds[regRecursion].GrpStart with direct pointe access. Skipping the need to calculate the element for regRecursion (which in 99% of cases is zero anyway).

On my PC, the benchmark gains 2% // Real live apps gain less, but still.

I know Delphi should have "PointerMath". I don't know if it is the same compiler directive. And I don't know if it works 100% identical. So that needs testing.

I tried to just use an array of PRegExprChar instead of ^PRegExprChar. But that slowed things down. It adds refcounts, and then, if one array is resized, the other isn't...

The pointers are safe. They are only used in MatchPrim. Nothing can change the underlying arrays during the run. And ClearInternalExecData ensures the value are valid at start. Data is still stored in the array, and any access outside MatchPrim reads it from the arrays, and not via the pointers.

User4martin commented 10 months ago

Added some more commits.

Depending on the outcome of Delphi tests (and/or review, if there is a matter) I can re-arrange the list of commits according to requests, before they get merged.

User4martin commented 10 months ago
Alexey-T commented 10 months ago

I dont have last Delphi. also I am on Linux. I tag users which were having Delphi, please test Martin's diff, ie download his repo and his new branch 'work2'. this is full URL to .pas file: https://raw.githubusercontent.com/User4martin/TRegExpr/work2/src/regexpr.pas

@dkounal @AmigoJack @dummzeuch @HPWi

dkounal commented 10 months ago

Tested with Delpi 11.3 test\test_delphi: needs the following in tests.pas to run: const LineEnding = #10; type SizeInt = LongInt;

but it stops with the following error in line 8432: 'TRegExpr compile: unknown meta-character: \p (pos 13)'

Screen results:


DUnitX - [test_delphi.exe] - Starting Tests.

...................................E.................................................................F.................................................E.E..............................................

Tests Found : 100 Tests Ignored : 0 Tests Passed : 96 Tests Leaked : 0 Tests Failed : 1 Tests Errored : 3

Failing Tests

tests.TTestRegexpr.RunTest27 Message: Search position, expected: <1> but was: <-1> Tests With Errors tests.TTestRegexpr.TestIsFixedLength Message: TRegExpr compile: unknown meta-character: \p (pos 13)

tests.TTestRegexpr.RunTest51unicode Message: TRegExpr compile: unknown meta-character: \p (pos 1)

tests.TTestRegexpr.RunTest52unicode Message: TRegExpr compile: unknown meta-character: \p (pos 2)

dkounal commented 10 months ago

I was some commits behind, so I updated and running tests before the above commit is the following:


DUnitX - [test_delphi.exe] - Starting Tests.

.....................................E.................................................................F.................................................E.E.....................................F........

Tests Found : 101 Tests Ignored : 0 Tests Passed : 96 Tests Leaked : 0 Tests Failed : 2 Tests Errored : 3

Failing Tests

tests.TTestRegexpr.RunTest27 Message: Search position, expected: <1> but was: <-1>

tests.TTestRegexpr.RunTest70 Message: Replace failed, expected: but was: <abr#0c>

Tests With Errors

tests.TTestRegexpr.TestIsFixedLength Message: TRegExpr compile: unknown meta-character: \p (pos 13)

tests.TTestRegexpr.RunTest51unicode Message: TRegExpr compile: unknown meta-character: \p (pos 1)

tests.TTestRegexpr.RunTest52unicode Message: TRegExpr compile: unknown meta-character: \p (pos 2)

After the above commit:


DUnitX - [test_delphi.exe] - Starting Tests.

.....................................E.................................................................F.................................................E.E..............................................

Tests Found : 101 Tests Ignored : 0 Tests Passed : 97 Tests Leaked : 0 Tests Failed : 1 Tests Errored : 3

Failing Tests

tests.TTestRegexpr.RunTest27 Message: Search position, expected: <1> but was: <-1>

Tests With Errors

tests.TTestRegexpr.TestIsFixedLength Message: TRegExpr compile: unknown meta-character: \p (pos 13)

tests.TTestRegexpr.RunTest51unicode Message: TRegExpr compile: unknown meta-character: \p (pos 1)

tests.TTestRegexpr.RunTest52unicode Message: TRegExpr compile: unknown meta-character: \p (pos 2)

Alexey-T commented 10 months ago

'TRegExpr compile: unknown meta-character: \p (pos 13)' occurs if RegEx engine did not have the define UnicodeRE. it is On by default in regexpr.pas.

// ======== Define options for TRegExpr engine
{$DEFINE UnicodeRE} // Use WideChar for characters and UnicodeString/WideString for strings   
dkounal commented 10 months ago

it is defined. I did not change anything in source

User4martin commented 10 months ago

Actually I just looked, \p depends on

{$IFDEF FPC}
  {$DEFINE FastUnicodeData} // Use arrays for UpperCase/LowerCase/IsWordChar, they take 320K more memory
{$ENDIF}

Not sure why this is FPC only?

But it was made so in Revision: 85b20bde027d0759ad93a6a745b0fc5c8c7a1b58 Author: Michael Van Canneyt Date: 19/02/2021 11:31:45

Though before that, it was not enabled by default.

It was made non-default in Revision: 861691f354ea032e90e0026a312111db4f964be4 Author: Michael Van Canneyt Date: 19/02/2021 11:21:36

Alexey-T commented 10 months ago

@User4martin

Not sure why this is FPC only?

Because Delphi has TCharacter which does the job and it is better than our array-method.

So, i can apply the change? no errors related to it?

User4martin commented 10 months ago

Some more changes:

User4martin commented 10 months ago

So, i can apply the change? no errors related to it?

Yes.

The only question was if the PointerMath works.

It does, otherwise it would likely not have compiled, but if it had compiled other tests would have failed (recursion tests)

User4martin commented 10 months ago