Allow ()* / fix Dump() / Limit Stack

User4martin commented 1 year ago

Many (if not most) other reg ex engines allow patterns to be quantified even if they could be empty or match 0-len.

(a|\b)* may match zero len, but should be valid.

For {} they even eval the given amount of times: https://regex101.com/r/BvQROU/1 (change the "4" to "5" in the {})

Dump() had a potential uninitialised result.

  RegEx := TRegExpr.Create('((a){2000,4000}){2000,4000}');
//  RegEx.MaxEvalStackDepth := 100000;
  RegEx.InputString := StringOfChar('a', 100000);
  RegEx.Exec();

The above would cause a stack overflow.

I added a limit, to avoid the exception.

For my tests on Windows:

Win 10 - 64 bit fails at stack depth 45000
Win 10 - 32 bit fails at stack depth 59000

Obviously stack size can be configured by the user, So numbers can change.

Moving more of the local vars to TRegExprMatchPrimLocals = record and have them share mem, could improve the max depth.

I thought about wrapping MatchPrim in an outer function (which would do the inc/dec) and avoid the try/finally. But that depends on {$IFDEF InlineFuncs}, and even then inlining isn't guaranteed, and if it is not inlined it is increasing stack usage.

Alexey-T commented 1 year ago

I will read your patch in the nearest days.

User4martin commented 1 year ago

Just done some tests...

The try finally causes a big slowdown. So that needs to be skipped for now / or replaced by a better solution

User4martin commented 1 year ago

The try finally causes a big slowdown. So that needs to be skipped for now / or replaced by a better solution

try finally made my app (which does mostly, but not only regex matching) go from 64 secs to 101 secs. +57%
Changing each exit to `begin dec(matchStackDepth); exit; end; went from 64 to 74. +15%
Comparing if InitialFrame - get_frame > fMaxEvalStackDepth then begin went from 64 to 68. +6%

The last one is not to bad, but I don't know if it works for Delphi. And it depends on the CPU. It needs to know if the stackpointer goes up or down (on intel it goes down, so the current get_frame is smaller.) fMaxEvalStackDepth must then be the size of the stack avail, eg 1 MB or whatever.

Alexey-T commented 1 year ago

even 15% slowdown is bad thing. we add small-needed feature of regex, and perfomance goes down? bad.

your patch is hard to understand for me (1st of them). so i gueess i just test it and then apply. if @andgineer can review it, i welcome it.

User4martin commented 1 year ago

even 15% slowdown is bad thing. we add small-needed feature of regex, and perfomance goes down? bad.

True, 15% is a lot.

But "small-need", well if an editor allows the user to enter search terms, then it can easily end up a stack overflow. For an editor it would not be a surprise to have a longer text (even 100k), and nested loops (just 2 or 3) can quickly reach the limit.

It may be possible to check for loops only, assuming that other recursions are limited already (IIRC recursions to named captures have a limit, Open-"("-groups too, branches probably too ... Not looked at all others where MatchPrim is recursed.

However many of the above can re-occur in each iteration of a loop => meaning that for 1 loop iteration the stack grows by 10 or more frames.

Actually just seen, there is an older attempt if regNestedCalls > MaxRegexBackTracking then => it is commented though.

your patch is hard to understand for me (1st of them). so i gueess i just test it and then apply. if @andgineer can review it, i welcome it.

1) The parser checks for FLAG_HAS_WIDTH, if it may be zero it uses EmitComplexBraces. Because this is the only loop that deals with that case

2) Because absence of FLAG_HAS_WIDTH no longer gives an error, I added FLAG_NOT_QUANTIFIABLE => this is set by ^ or \b so ^ or \b can be rejected. It is not forwarded when it is in () so (\b)* is allowed.

3) 'CurrentRegInput' is used to detect if the input moved (if not it is zero width). OP_LOOPENTRY sets it to nil, because it always goes directly to OP_LOOP without processing any input (so it would never advance). OP_LOOP then sets it, and detects it. (For all else it is kept with the loop data, ensuring that for nested loops the correct record is used)

The way nested loops are evaled is tricky (but exists before the patch).
I.e. if an inner loop tests that the pattern after the loop matches at the point of the current iteration, then that test will run into the OP_LOOP of the outer loop, and therefore also check if the outer loop can get to a valid iteration count (and validating the outer loop may run further instances of the inner loop for each iteration that the outer loop is still taking...)

Result := (BracesMax = MaxBracesArg) and => detects {n,m} instead of * or +. As explained {n,m} should run the expected count, even if zero width => because it can changed captures contained in look around. This is the same on he regexp101 site. However * or + can not run the 4 million times....

User4martin commented 1 year ago

I've tried a diff approach. Handing the depth in via param https://github.com/User4martin/TRegExpr/commit/0215f19e36bf3e442c20e60d1d9ce8c082550d10

And only checking for the max inside loops (I have to check, but I think the other OP_* can only add limited amount before they must go through a loop. => though OP_BACK may have to be checked, but it probably is never used with captures....

Below are the benchmarks (my own project increased by 5 %)

                                                        Time     | Time with stack limit
==============================================================================
                               /Twain/ :          31 ms    31 ms |       16 ms    31 ms
                           /(?i)Twain/ :          47 ms    47 ms |       79 ms    63 ms
                          /[a-z]shing/ :         250 ms   250 ms |      250 ms   234 ms
          /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :          31 ms    31 ms |       31 ms    31 ms
                           /\b\w+nn\b/ :         250 ms   250 ms |      234 ms   235 ms
                    /[a-q][^u-z]{13}x/ :         641 ms   640 ms |      672 ms   656 ms
         /Tom|Sawyer|Huckleberry|Finn/ :          47 ms    47 ms |       31 ms    32 ms
     /(?i)Tom|Sawyer|Huckleberry|Finn/ :         187 ms   203 ms |      203 ms   203 ms
 /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :        2656 ms  2657 ms |     2688 ms  2703 ms     1.5%
 /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :        2828 ms  2859 ms |     2906 ms  2875 ms     1.2%
   /Tom.{10,25}river|river.{10,25}Tom/ :          47 ms    62 ms |       47 ms    62 ms
                        /[a-zA-Z]+ing/ :         625 ms   625 ms |      578 ms   579 ms
               /\s[a-zA-Z]{0,12}ing\s/ :         265 ms   282 ms |      266 ms   265 ms
       /([A-Za-z]awyer|[A-Za-z]inn)\s/ :         641 ms   656 ms |      657 ms   641 ms
           /["'][^"']{0,30}[?!\.]["']/ :          78 ms    78 ms |       78 ms    62 ms

Though none of them runs very long loops.

Alexey-T commented 1 year ago

Handing the depth in via param

It looks OK. But commit is out of branch.

User4martin commented 1 year ago

Actually there is a better way. WE can just catch on E: EStackOverflow do and set/throw a reg-ex error.

That way usercode does not need to deal with it, but can just do the normal error checking.

Alexey-T commented 1 year ago

OK too for me.

User4martin commented 1 year ago

Just wondering...

If a regex exceeds the stack => should that

call Error and raise an exception
Exec/Match return false

At first it seems obvious => raise an exception. But currently all the errors that can happen during eval are internal. The user code will work even if it does not wrap Exec/Match into a try except.

If "too complex" throws an error, then all user code must add exception handling (and that is time consuming)

Alexey-T commented 1 year ago

If "too complex" throws an error, then all user code must add exception handling (and that is time consuming)

yes, I agree with you, better to avoid changing user code. and return False.

Alexey-T commented 1 year ago

strange:

Result := False;
Result := MatchPrim(regCodeWork);

except
on E: EStackOverflow do begin
  Result := False;
  //Error(reeLoopStackExceeded);
end;
end;

"else raise" is missed?

Alexey-T commented 1 year ago

speed is slower now, still?

User4martin commented 1 year ago

speed is slower now, still?

My project has no noticeable change (less than half a second for over a minute runtime => that is normal fluctuation)

Same for the benchmark (here the difference between 2 runs are bigger, than between with/without).

It's one try/except per run. So a single run will not have noticeable diff, however if a user does 100k runs of very simple patterns (like ^1 which would be really fast to run) then the 100k except blocks could add...
I tried though, on Windows 40m runs of that pattern => no diff within GetTickCount (1.6 secs) .

I haven't tried on Linux, different exception handling could mean different outcome.

"else raise" is missed?

Ups, but maybe...

It would be good to have an option. raise or just set errorcode.
property RaiseForRuntimeError (default true)

then this could be for all exceptions.

Alexey-T commented 1 year ago

property RaiseForRuntimeError with default false (so I won't change the main app) - is OK to me.

User4martin commented 1 year ago

property RaiseForRuntimeError with default false (so I won't change the main app) - is OK to me.

Well, errors like reeMatchPrimMemoryCorruption, reeMatchPrimCorruptedPointers, reeCorruptedProgram, ... did get raised until now. (Same for the stack-overflow, that has always been possible) However, they aren't expected by the user, so likely not caught.

Errors like reeOffsetMustBePositive, reeExecNextWithoutExec, reeNoExpression also got raised. If user code would check for those exceptions depends on how much un-checked userdata goes through.

Would the code ever set an empty expression?
Would the app let the user choose "continue search" if it never did start a search before...

Maybe, maybe not.

That is why I thought default = True.

Alexey-T commented 1 year ago

That is why I thought default = True.

you are right, sorry, 'true' is better.

User4martin commented 1 year ago

Changed to True

Alexey-T commented 1 year ago

So speed-test-app don't show slowdown (other than fluctuation)?

Alexey-T commented 1 year ago

I did simple test run of 'test' project. downloaded your git branch. crash! Linux x64, Free Pascal Compiler version 3.2.3-772-gd520814adb [2023/09/05] for x86_64

Screenshot from 2023-11-12 21-20-34

User4martin commented 1 year ago

Strange. Ive been running it plenty of times. On Windows though.

Could you comment out line 959 in "TestBads"?

  IsNotMatching('Too Complex',   '^((b|a){1,5000}){1,5000}',   StringOfChar('a', 991*503)); // both factors are prime

Maybe Linux does not throw an EStackOverflow.

The above test expression is/was valid before my changes. And it would have always triggered a stack overflow.

Alexey-T commented 1 year ago

yes, that line is the reason, w/o it - all ok. add IFDEF WINDOWS for it, pls.

Alexey-T commented 1 year ago

what call-stack in IDE shows: it is recursion of MatchPrim.

User4martin commented 1 year ago

Just tested

On WIndows you get an EStackOverflow, even with O4 enabled and -Ct (stack checks) disabled
(Could be that my fpc is compiled differently...)
On Linux EStackOverflow only happens with -Ct

So without -Ct the given regex crashes the app. I have not tested, but the regex has nothing that requires recent features. ^((b|a){1,5000}){1,5000} has been valid for a long time. So that crash has always been there.

Alexey-T commented 1 year ago

okay. maybe add $S to the code? https://www.freepascal.org/docs-html/prog/progsu72.html

User4martin commented 1 year ago

That would be {$PUSH}{$S+} and {$POP} And I guess MatchPrim is enough => I think the stack check has some margin (at least on Win I saw that in the fpc source). So if MatchPrim makes other calls it probably is in that margin.

I haven't done full tests, but on Linux that has decreased speed a lot. (Since I only tested a subset of my app, I can't compare with the other/earlier checks).

Alexey-T commented 1 year ago

Let's add ifdef windows to the crashing test?

User4martin commented 1 year ago

The question is which way to go.

StackCheck has the advantage that it needs no user settings, but always uses as much stack as available. Limit the recursion with a counter (my earlier attempts) will often limit too much, and yet does not hold a guarantee to limit enough (Well the count could be calculated from the stacksize, but that is CPU/Platform dependent)

{$S+ means stackcheck is done by a non-inlined call to the procedure doing the check. And also that does check alignment too, so spends extra time. Hence it is expensive.

I have to double check, but on Windows it appears that the OS triggers a special stack overflow error (probably based on a paging violation or similar) => so I guess fpc can catch that. On Linux this just gives an "Access violation" and is not translated. And then there are other platforms....

Alexey-T commented 1 year ago

yes, I understood. current way (no $S and no counter to check for max) is OK to me. i just wanted to avoid the crash on my PC. and to disable that 1 line in tests.

User4martin commented 1 year ago

I commented the test out for now. It also always breaks in the debugger (the debugger can not currently be told to ignore it). So it is annoying even on Windows.

Alexey-T commented 1 year ago

Ok. let's wait for few days until you find all issues, and then I will apply it

User4martin commented 1 year ago

OK some more testing (for now only tested Windows, but expect similar on Linux)

  //if sptr  < StackLimit then begin
  if get_frame < StackLimit then begin
    error(reeBadRecursion);
    exit;
  end;

sptr is not inlined, and therefore 1 second (per minute) slower (not much, but...)

get_frame is inlined and does the same job (as long as no one adds nostackframe, because it uses the frame...). In any case as the method has plenty of vars, it does need the frame anyway.

NOTE, INTEL only => This is for Intel/AMD. For other CPU it needs to be checked if the stack pointer goes up or down. Intel starts the stack at the high address and then goes down. Hence the <. For other CPU it may have to be a >. So this code can only be added for each known and tested Cpu.

Overall the above did not add to the time my project used (running 67 secs either way).

However it does add a bit in the benchmark

                                                  Time         | with check
========================================================================
                                  /Twain/ :     47 ms     47 ms |     47 ms   46 ms
                              /(?i)Twain/ :     78 ms     78 ms |     78 ms   94 ms
                             /[a-z]shing/ :    281 ms    375 ms |    359 ms  360 ms
             /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :     47 ms     47 ms |     47 ms   47 ms
                              /\b\w+nn\b/ :    282 ms    359 ms |    359 ms  375 ms
                       /[a-q][^u-z]{13}x/ :    703 ms    781 ms |    797 ms  797 ms
            /Tom|Sawyer|Huckleberry|Finn/ :     62 ms     47 ms |     62 ms   62 ms
        /(?i)Tom|Sawyer|Huckleberry|Finn/ :    219 ms    250 ms |    250 ms  250 ms
    /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :   2766 ms   2843 ms |   2922 ms 2875 ms
    /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :   3047 ms   3125 ms |   3156 ms 3094 ms
      /Tom.{10,25}river|river.{10,25}Tom/ :     63 ms     79 ms |     78 ms   78 ms
                           /[a-zA-Z]+ing/ :    610 ms    734 ms |    750 ms  735 ms
                  /\s[a-zA-Z]{0,12}ing\s/ :    281 ms    312 ms |    328 ms  328 ms
          /([A-Za-z]awyer|[A-Za-z]inn)\s/ :    672 ms    797 ms |    813 ms  797 ms
              /["'][^"']{0,30}[?!\.]["']/ :     93 ms     94 ms |     94 ms   78 ms

2 runs with original code, then 2 runs with check added. Note that /.{2,4}(.. even had one of the "with check" runs faster than without. But otherwise the "with check" is a bit slower.

That seems to be the fastest I can find.

Alexey-T commented 1 year ago

So we won't have check for stack overflow, it';s ok for me.

User4martin commented 1 year ago

So we won't have check for stack overflow, it';s ok for me.

Actually I think we could have {$DEFINE WITH_STACKOVERFLOW_PROTECTION}. Which could be off by default.

Then it would work for selected platform only.

I.e. currently I would add it for

  {$IF defined(Linux) or defined(Windows)}
    {$IF defined(CPU386) or defined(CPUX86_64)}

Then if someone can test for other platform/cpu those can be added.

Independent of this, if the code throws a EStackOverflow on its own, then that will be caught too. (as it currently is)

Alexey-T commented 1 year ago

we could have {$DEFINE WITH_STACKOVERFLOW_PROTECTION}. Which could be off by default.

yes, off by default. If you can add it, okay.

User4martin commented 1 year ago

Still working ... And benchmarks coming

Alexey-T commented 1 year ago

okay. you can 'concatenate' git commits (I dunno how to do it in git) so diff will be smaller.

User4martin commented 1 year ago

you can 'concatenate' git commits (I dunno how to do it in git) so diff will be smaller.

git REBASE... done. => I had the below already written, so it refers to the not-yet-merged. "Current" no longer exists. "NEW" is what the top of the branch is now.

Before the rebase: https://github.com/User4martin/TRegExpr/commits/e34aedb6949a2ea5c9a8f9a2b85539fdf260139b

"Current" is before e34aedb6949a2ea5c9a8f9a2b85539fdf260139b "NEW" is at e34aedb6949a2ea5c9a8f9a2b85539fdf260139b

The NEW code reduces the speed impact of "turning exception to errors", IF (and only IF) it is turned off (property RaiseForRuntimeError := True). => That is the original behaviour, in which exceptions are going to the user.
===> Compare "New raise OFF" (behave as original, but has new property) and "No Raise" (original, before any change)

I run the benchmark with -c 20 (time is avg of 20 runs for each row). For some I repeated the run (again with -c 20), and even at that many runs some rows have noticeable fluctutation.

I added new benchmarks, that have a high match count, this means the setup (containing the try/except) is run more often. While other benchmarks may have few matches, but will have to do deep recursions for the loop (hitting code that is at the start of MatchPrim).

full-check: Current code with RegExpWithStackOverflowCheck defined
current: Current code WITHOUT RegExpWithStackOverflowCheck defined
NO raise: "simulate original" => raise/except commented out - exception ALWAYS handed to user (porperty RaiseForRuntimeErro removed)
=> stack overflow will not be handled
NEW raise2Err OFF (true): NEW CODE: - exception ALWAYS handed to user (property RaiseForRuntimeError := True)
=> stack overflow will not be handled (behave as original)
NEW raise2Err ON (false): NEW CODE: - exception NOT handed to user (property RaiseForRuntimeError := False)
NEW raise2Err OFF (true) - full: NEW CODE: - exception ALWAYS handed to user / with RegExpWithStackOverflowCheck defined
This is like only RegExpWithStackOverflowCheck
FOR FUN: same as "NEW raise OFF (true)" but with 1-byte char. (and only -c 5)

Windows 10 - 64 bit
                                             Match count  # full-check           # current              # NO raise             # NEW raise2Err OFF  # NEW raise2Err ON # NEW raise2Err OFF - full ## FOR FUN
                                                                                                          (Original- ish)                                                 (check stack)
=======================================================================================================================================================================================================
                                     /Twain/ :       2388 #     48 ms |    49 ms #     49 ms |    48 ms #    39 ms |    39 ms  #    35 ms |    31 ms  #    31 ms       #     39 ms            ##     19 ms 
                                 /(?i)Twain/ :       2657 #     85 ms |    85 ms #     78 ms |    74 ms #    58 ms |    58 ms  #    56 ms |    56 ms  #    57 ms       #     68 ms            ##     59 ms 
                                /[a-z]shing/ :       1877 #    373 ms |   374 ms #    343 ms |   276 ms #   239 ms |   239 ms  #   238 ms |   241 ms  #   238 ms       #    375 ms            ##    219 ms 
                /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :        396 #     49 ms |    50 ms #     49 ms |    49 ms #    40 ms |    39 ms  #    35 ms |    32 ms  #    32 ms       #     39 ms            ##     20 ms 
                                         /./ :   18905427 #    682 ms |   682 ms #    459 ms |   531 ms #   468 ms |   470 ms  #   461 ms |   462 ms  #   461 ms       #    666 ms            ##    406 ms 
                                       /(.)/ :   18905427 #    925 ms |  1061 ms #    712 ms |   771 ms #   743 ms |   685 ms  #   728 ms |   741 ms  #   735 ms       #    918 ms            ##    664 ms 
                                         /e/ :    1781425 #    121 ms |   122 ms #    105 ms |   106 ms #    82 ms |    82 ms  #    80 ms |    82 ms  #    82 ms       #    101 ms            ##     67 ms 
                                       /(e)/ :    1781425 #    144 ms |   194 ms #    125 ms |   125 ms #   102 ms |   101 ms  #   103 ms |   103 ms  #   103 ms       #    124 ms            ##     91 ms 
                              /(?s).{1,45} / :     475715 #     41 ms |    65 ms #     38 ms |    37 ms #    37 ms |    38 ms  #    38 ms |    37 ms  #    37 ms       #     43 ms            ##     22 ms 
                            /(?s)\G.{1,45} / :      10616 #    356 ms |   388 ms #    185 ms |   186 ms #   340 ms |   198 ms  #   169 ms |   160 ms  #   164 ms       #    320 ms            ##    167 ms 
                            /\G(?s).{1,45} / :      10616 #     16 ms |    16 ms #     15 ms |    15 ms #    16 ms |    16 ms  #    16 ms |    16 ms  #    17 ms       #     15 ms            ##      0 ms 
                             /(?s).{1,45}? / :    3241534 #    192 ms |   194 ms #    169 ms |   169 ms #   167 ms |   171 ms  #   167 ms |   173 ms  #   182 ms       #    200 ms            ##    152 ms 
                           /(?s)\G.{1,45}? / :      69431 #    360 ms |   359 ms #    189 ms |   189 ms #   176 ms |   177 ms  #   172 ms |   168 ms  #   164 ms       #    321 ms            ##    170 ms 
                           /\G(?s).{1,45}? / :      69431 #     19 ms |    19 ms #     18 ms |    19 ms #    18 ms |    18 ms  #    18 ms |    19 ms  #    19 ms       #     20 ms            ##      3 ms 
                                 /\b\w+nn\b/ :        359 #    390 ms |   390 ms #    277 ms |   277 ms #   250 ms |   275 ms  #   243 ms |   249 ms  #   246 ms       #    371 ms            ##    290 ms 
                          /[a-q][^u-z]{13}x/ :       4929 #    793 ms |   985 ms #    674 ms |   695 ms #   640 ms |   640 ms  #   685 ms |   711 ms  #   675 ms       #    785 ms            ##    648 ms 
               /Tom|Sawyer|Huckleberry|Finn/ :       3015 #     54 ms |    53 ms #     53 ms |    53 ms #    43 ms |    43 ms  #    38 ms |    35 ms  #    35 ms       #     44 ms            ##     24 ms 
           /(?i)Tom|Sawyer|Huckleberry|Finn/ :       4820 #    263 ms |   257 ms #    230 ms |   230 ms #   200 ms |   214 ms  #   194 ms |   198 ms  #   197 ms       #    219 ms            ##    352 ms 
       /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :       3015 #   2992 ms |  3196 ms #   2792 ms |  2839 ms #  2697 ms |  2685 ms  #  2713 ms |  2722 ms  #  2719 ms       #   2878 ms            ##   2551 ms 
       /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :       2220 #   3405 ms |  3299 ms #   3071 ms |  3191 ms #  2937 ms |  2917 ms  #  2996 ms |  2991 ms  #  3046 ms       #   3122 ms            ##   2745 ms 
         /Tom.{10,25}river|river.{10,25}Tom/ :          2 #     82 ms |    82 ms #     73 ms |    92 ms #    60 ms |    60 ms  #    54 ms |    57 ms  #    70 ms       #     66 ms            ##     43 ms 
                              /[a-zA-Z]+ing/ :      95863 #    750 ms |   748 ms #    609 ms |   676 ms #   625 ms |   628 ms  #   591 ms |   628 ms  #   657 ms       #    793 ms            ##    600 ms 
                     /\s[a-zA-Z]{0,12}ing\s/ :      67810 #    318 ms |   332 ms #    287 ms |   286 ms #   278 ms |   278 ms  #   268 ms |   271 ms  #   271 ms       #    348 ms            ##    261 ms 
             /([A-Za-z]awyer|[A-Za-z]inn)\s/ :        313 #    825 ms |   825 ms #    696 ms |   694 ms #   658 ms |   642 ms  #   646 ms |   657 ms  #   659 ms       #    876 ms            ##    614 ms 
                 /["'][^"']{0,30}[?!\.]["']/ :       9857 #     95 ms |    94 ms #     88 ms |    86 ms #    85 ms |    83 ms  #    79 ms |    73 ms  #    74 ms       #     83 ms            ##     60 ms 
                   /Tom(.{3,3}|.{5,5})*Finn/ :         11 #   3683 ms |  3562 ms #   3146 ms |  3403 ms #  3127 ms |  3293 ms  #  3140 ms |  3167 ms  #  3242 ms       #   3387 ms            ##   3132 ms 
                       /Tom(...|.....)*Finn/ :         11 #   1891 ms |  2124 ms #   1708 ms |  1802 ms #  1739 ms |  1671 ms  #  1785 ms |  1874 ms  #  1979 ms       #   1754 ms            ##   1564 ms 
                      /Tom(...|.....)*?Finn/ :         11 #   1927 ms |  1763 ms #   1651 ms |  1714 ms #  1633 ms |  1754 ms  #  1641 ms |  1646 ms  #  1673 ms       #   1721 ms            ##   1537 ms 
         /Tom((...|.....){2,9}\s){1,5}?Finn/ :         11 #   4865 ms |  4573 ms #   4392 ms |  4355 ms #  4305 ms |  4596 ms  #  4391 ms |  4467 ms  #  4314 ms       #   4517 ms            ##   4228 ms 
         /Tom((...|.....){2,9}?\s){1,5}Finn/ :         11 #   5181 ms |  4576 ms #   4309 ms |  4289 ms #  4245 ms |  4282 ms  #  4310 ms |  4321 ms  #  4283 ms       #   4541 ms            ##   4172 ms 
                           /\G(?is).(?=.*$)/ :   20045118 #   1212 ms |  1186 ms #    986 ms |   982 ms #   943 ms |   921 ms  #   968 ms |   935 ms  #   932 ms       #   1203 ms            ##   1014 ms 
                   /\G(?is).(?=(.){1,5}?$)?/ :   20045118 #   4123 ms |  4063 ms #   3541 ms |  3532 ms #  3479 ms |  3605 ms  #  3869 ms |  3568 ms  #  3728 ms       #   4028 ms            ##   3592 ms 
                          /\G(?is).(?=.*+$)/ :   20045118 #   1182 ms |  1178 ms #   1096 ms |   975 ms #   927 ms |   889 ms  #   963 ms |   953 ms  #   952 ms       #   1283 ms            ##    995 ms 
   /\G(?is).{10,10}(?=(e|y|on|fin|.){0,20})/ :    2004511 #   2557 ms |  2560 ms #   2395 ms |  2433 ms #  2427 ms |  2411 ms  #  2392 ms |  2417 ms  #  2421 ms       #   2546 ms            ##   3174 ms 
 /\G(?is).{10,10}(?=(?>e|y|on|fin|.){0,20})/ :    2004511 #   2573 ms |  2554 ms #   2395 ms |  2460 ms #  2432 ms |  2392 ms  #  2396 ms |  2415 ms  #  2461 ms       #   2541 ms            ##   3160 ms 
                             /Tom(?!.*Finn)/ :       2441 #     50 ms |    50 ms #     49 ms |    49 ms #    43 ms |    41 ms  #    35 ms |    32 ms  #    34 ms       #     40 ms            ##     26 ms 
Total                                                        42622 ms | 42108 ms #  37052 ms | 37708 ms # 36298 ms | 36651 ms  # 36713 ms | 36708 ms  # 37004 ms       #  40412 ms            ##  36856 ms

Linux benchmares with -c 5

Linux 64 bit (virtualbox)

                                              | Match count # ORIGINAL    # NEW raise2Err OFF # NEW raise2Err ON # NEW raise2Err OFF - full
                                                          (before any change)                                       (check stack)
============================================================================================================================================================
regexpr.pas:
                                        /Twain/ :      2388 #       41 ms #      42 ms        #      41 ms       #      46 ms
                                    /(?i)Twain/ :      2657 #       94 ms #      97 ms        #      99 ms       #      98 ms
                                   /[a-z]shing/ :      1877 #      944 ms #     958 ms        #     976 ms       #     948 ms
                   /Huck[a-zA-Z]+|Saw[a-zA-Z]+/ :       396 #       42 ms #      44 ms        #      46 ms       #      48 ms
                                            /./ :  18905425 #     1376 ms #    1432 ms        #    1731 ms       #    1411 ms
                                          /(.)/ :  18905425 #     2203 ms #    2313 ms        #    2600 ms       #    2328 ms
                                            /e/ :   1781425 #      177 ms #     187 ms        #     212 ms       #     183 ms
                                          /(e)/ :   1781425 #      255 ms #     264 ms        #     293 ms       #     264 ms
                                 /(?s).{1,45} / :    475715 #       83 ms #      86 ms        #      93 ms       #      86 ms
                               /(?s)\G.{1,45} / :     10616 #      767 ms #     786 ms        #     800 ms       #     814 ms
                               /\G(?s).{1,45} / :     10616 #       26 ms #      25 ms        #      28 ms       #      26 ms
                                /(?s).{1,45}? / :   3241534 #      413 ms #     437 ms        #     490 ms       #     447 ms
                              /(?s)\G.{1,45}? / :     69431 #      765 ms #     810 ms        #     813 ms       #     810 ms
                              /\G(?s).{1,45}? / :     69431 #       32 ms #      34 ms        #      34 ms       #      33 ms
                                    /\b\w+nn\b/ :       359 #      551 ms #     584 ms        #     589 ms       #     568 ms
                             /[a-q][^u-z]{13}x/ :      4929 #     1284 ms #    1327 ms        #    1299 ms       #    1292 ms
                  /Tom|Sawyer|Huckleberry|Finn/ :      3015 #       54 ms #      55 ms        #      55 ms       #      59 ms
              /(?i)Tom|Sawyer|Huckleberry|Finn/ :      4820 #      460 ms #     475 ms        #     468 ms       #     466 ms
          /.{0,2}(Tom|Sawyer|Huckleberry|Finn)/ :      3015 #     8704 ms #    8974 ms        #    8718 ms       #    8808 ms
          /.{2,4}(Tom|Sawyer|Huckleberry|Finn)/ :      2220 #     9448 ms #    9702 ms        #    9478 ms       #    9627 ms
            /Tom.{10,25}river|river.{10,25}Tom/ :         2 #      105 ms #     109 ms        #     106 ms       #     106 ms
                                 /[a-zA-Z]+ing/ :     95863 #     2501 ms #    2580 ms        #    2492 ms       #    2544 ms
                        /\s[a-zA-Z]{0,12}ing\s/ :     67810 #      919 ms #     953 ms        #     932 ms       #     938 ms
                /([A-Za-z]awyer|[A-Za-z]inn)\s/ :       313 #     2893 ms #    2998 ms        #    2916 ms       #    2952 ms
                    /["'][^"']{0,30}[?!\.]["']/ :      9857 #      126 ms #     125 ms        #     125 ms       #     132 ms
                      /Tom(.{3,3}|.{5,5})*Finn/ :        11 #    10526 ms #   11186 ms        #   11149 ms       #   10902 ms
                          /Tom(...|.....)*Finn/ :        11 #     5651 ms #    6038 ms        #    6130 ms       #    5952 ms
                         /Tom(...|.....)*?Finn/ :        11 #     4934 ms #    5162 ms        #    5285 ms       #    5053 ms
            /Tom((...|.....){2,9}\s){1,5}?Finn/ :        11 #    13297 ms #   14304 ms        #   14221 ms       #   13414 ms
            /Tom((...|.....){2,9}?\s){1,5}Finn/ :        11 #    13365 ms #   14266 ms        #   14142 ms       #   13553 ms
                              /\G(?is).(?=.*$)/ :  20045118 #     3413 ms #    3465 ms        #    3792 ms       #    3492 ms
                      /\G(?is).(?=(.){1,5}?$)?/ :  20045118 #    12193 ms #   12457 ms        #   12945 ms       #   12432 ms
                             /\G(?is).(?=.*+$)/ :  20045118 #     3405 ms #    3518 ms        #    3757 ms       #    3470 ms
      /\G(?is).{10,10}(?=(e|y|on|fin|.){0,20})/ :   2004511 #     7094 ms #    7081 ms        #    7156 ms       #    6958 ms
    /\G(?is).{10,10}(?=(?>e|y|on|fin|.){0,20})/ :   2004511 #     7123 ms #    7146 ms        #    7159 ms       #    6989 ms
                                /Tom(?!.*Finn)/ :      2441 #       45 ms #      47 ms        #      50 ms       #      53 ms
Total:                                                          115323 ms #  120079 ms        #  121237 ms       #  117316 ms

So much for fluctuations, the last column is faster than the 2nd column => yet it does definetily more work.

But that can actually happen. It is possible that the inserted code moved the other code inside the function, and the same code, but moved a few bytes can be faster (some dependency on 32byte boundaries).

User4martin commented 1 year ago

Not sure why some times are (and have been) so much worse on Linux... Maybe because its a VM???

Though it could (one of many possibilities) be that the WS code for linux changes the layout in the exe, and functions are at diff addresses. Why that matters? https://www.youtube.com/watch?v=r-TLSBdHe1A

andgineer / TRegExpr

Allow ()* / fix Dump() / Limit Stack #337