BeRo1985 / flre

FLRE - Fast Light Regular Expressions - A fast light regular expression library
GNU Lesser General Public License v2.1
94 stars 23 forks source link

MatchAll speed #15

Open benibela opened 8 years ago

benibela commented 8 years ago

MatchAll with limit=1 is 10 times slower than Sorokins' regexp:

uses FLRE,RegExpr;

procedure TForm1.Button1Click(Sender: TObject);
var f: TFLRE;
  tc: types.DWORD;
  i: Integer;
  r: TRegExpr;
  foo: TFLREMultiCaptures;
begin
  f := tflre.Create('^\s*Nächste\s*.*$', [rfUTF8]); // '^\s*Nächste' is just as slow
  tc := GetTickCount;
  for i := 1 to 10000000 do
    f.UTF8MatchAll('xyz',foo,1,1);
  caption := IntToStr(GetTickCount - tc);

  f := tflre.Create('\s*Nächste\s*.*', [rfUTF8]);
  tc := GetTickCount;
  for i := 1 to 10000000 do
    f.utf8Test('xyz'); 
  caption := caption  + ' ' + IntToStr(GetTickCount - tc);

  r := TRegExpr.Create();
  r.Expression:='^\s*Nächste\s*.*$';
  tc := GetTickCount;
  for i := 1 to 10000000 do
    r.exec('xyz');
  caption :=  caption + ' vs. ' + IntToStr(GetTickCount - tc);
end;

=> 27s vs 2.5s

benibela commented 8 years ago

Or this one:

procedure TForm1.Button1Click(Sender: TObject);
var f: TFLRE;
  tc: types.DWORD;
  i: Integer;
  r: TRegExpr;
  foo: TFLREMultiCaptures;
  foo2: TFLRECaptures;
  RE,input: String;
begin
  RE := '\s*^ISBN:\s*'; //I do not know why the ^ is in the middle
  input := #13#10'                 Zf2304'#13#10'        ';

  f := tflre.Create(RE, [rfUTF8]);
  tc := GetTickCount;
  for i := 1 to 10000000 do
    f.UTF8MatchAll(input,foo,1,1); 
  caption := IntToStr(GetTickCount - tc);

  f := tflre.Create(RE, [rfUTF8]);
  tc := GetTickCount;
  for i := 1 to 10000000 do
    f.UTF8Find(input);
  caption := caption  + ' ' + IntToStr(GetTickCount - tc);

  r := TRegExpr.Create();
  r.Expression:=RE;
  tc := GetTickCount;
  for i := 1 to 10000000 do
    r.exec(input);
  caption :=  caption + ' vs. ' + IntToStr(GetTickCount - tc);
end;                                 

27s vs. 7s vs. 3s

BeRo1985 commented 8 years ago

Hm, at me is 1375ms (ca. 1s) (FLRE) vs. 3968s (ca. 4s) (TRegExpr) with Delphi 7 with the FastMM4 memory manager on a Intel Xeon E3-1345v2 (= circa Intel i7-3770S just together with ECC-RAM support and so on). Which compiler are you using? And which memory manager (I guess the compiler own default)?

benibela commented 8 years ago

All FPC's default

 -MObjFPC -Scgi -Cg -O2 -g -gl -l -vewnhibq -dLCL -dLCLgtk2

on an Intel P8600

BeRo1985 commented 8 years ago

Oh ok, then I know what the reason probably is: The slow exception handling of FPC. Because FPC (under Windows) does still use the c-style slow SetJmp exception handling instead of the fast SEH exception handling.

Because FLRE do (for example in TFLRE.PtrMatchAll):

 ThreadLocalStorageInstance:=AcquireThreadLocalStorageInstance;
 try
  ThreadLocalStorageInstance.Input:=Input;
  ThreadLocalStorageInstance.InputLength:=InputLength;
  ....
 finally
  ReleaseThreadLocalStorageInstance(ThreadLocalStorageInstance);
 end;

and so similar try/finally stuff.

Such code is fast with Delphi (because it uses the Windows native SEH exceptions and not the SetJmp-style fallback solution), but not (yet) with FPC.

A solution would be that I will ifdef'ed the try/finally stuff for FPC. Hmm, I'll have to think about that, what would be best here.

benibela commented 8 years ago

Actually the SetLength call a big problem Without it, UTF8MatchAll and UTF8Test have around the same speed. Still 6s vs Sorokin's 3s

Most time is spend there: http://i.imgur.com/HkeCVMy.png http://i.imgur.com/xEzWHr1.png

benibela commented 8 years ago

Actually, if I compile everything myself, FLRE is twice as fast. Perhaps the version of Sorokin's in fpc's RTL was compiled with different compiler flags