andgineer / TRegExpr

Regular expressions (regex), pascal.
https://regex.sorokin.engineer/en/latest/
MIT License
175 stars 62 forks source link

Doesn't work in anon thread #390

Open Uefi1 opened 1 week ago

Uefi1 commented 1 week ago

Hi TRegEXpr does not work in conjunction with TstreamReader in a thread TThread.CreateAnonymousThread( procedure begin ..... I get a buffer overflow error: Exception class ERangeError with message 'Input buffer exceeded for DestinationIndex = 0, Count = 18432'

Alexey-T commented 1 week ago

I guess you do something wrong. Show the primitive demo to repeat the error.

I dont have Delphi, i have FPC.

Uefi1 commented 1 week ago

I guess you do something wrong. Show the primitive demo to repeat the error.

I dont have Delphi, i have FPC.

Hello, I didn’t even expect you to answer so quickly, here’s a simple code that demonstrates the problem:

program Project2;

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.Classes, System.SyncObjs, VCL.Dialogs, Regexpr;

var
cs:TCriticalSection;

function pars(const Reg: TRegExpr; const text: string):string;
begin
Result:='';
if Reg.Exec(text) then
repeat
cs.Enter;
writeln(Reg.Substitute('$1'));
cs.Leave;
until not Reg.ExecNext;
end;

procedure FRead(const F:string);
const
BUFSIZE=1024*8;
var
Regex: TRegExpr;
sr:TStreamReader;
Buf: array [0..BUFSIZE] of Char;
begin
Regex:=TRegExpr.Create('([a-z]+)');
Regex.Compile;
sr:=TStreamReader.Create(F, TEncoding.Default, True);
while not sr.EndOfStream do begin
sr.ReadBlock(@Buf, 0, BUFSIZE);
pars(Regex, Buf+sr.readline);
end;
sr.Close;
sr.Free;
end;

var
opendialog:TOpendialog;
path:string;
begin
cs:=TCriticalSection.Create;
opendialog:=TOpendialog.Create(nil);
if opendialog.Execute then
path:=opendialog.FileName;
opendialog.Free;
TThread.CreateAnonymousThread(
procedure
begin
FRead(path);
end).Start;
end.

If you remove the wrapper TThread.CreateAnonymousThread then everything works !

Uefi1 commented 1 week ago

I also tested with omnithreadlibrary and wrapped it in CreateTask instead of TThread.CreateAnonymousThread, the problem is also there

Alexey-T commented 1 week ago

Anon functions require fpc 3.3, i dont have it. Maybe will install it.

Pls use triple quotes for code block, ie markdown fenced block.

Uefi1 commented 1 week ago

Anon functions require fpc 3.3, i dont have it. Maybe will install it.

Pls use triple quotes for code block, ie markdown fenced block.

Try using omnithreadlibrary CreateTask, this problem also exists there !

Uefi1 commented 1 week ago

By the way, for some reason this error also occurs with the standard library System.RegularExpressions:

program Project2;

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.Classes, System.SyncObjs, VCL.Dialogs, System.RegularExpressions;

var
cs:TCriticalSection;

function pars(const Reg: TRegEx; const text: string):string;
var
Match:TMatch;
begin
Result:='';
Match:=Reg.Match(text);
if Match.Success then
repeat
writeln(Match.Result('$1'));
until not Match.NextMatch.Success;
end;

procedure FRead(const F:string);
const
BUFSIZE=1024*4;
var
Regex: TRegEx;
Options: TRegexOptions;
sr:TStreamReader;
Buf: array [0..BUFSIZE] of Char;
begin
Options:=[romultiline];
Regex:=TRegEx.Create('([a-z]+)', Options);
sr:=TStreamReader.Create(F, TEncoding.Default, True);
while not sr.EndOfStream do begin
sr.ReadBlock(@Buf, 0, BUFSIZE);
pars(Regex, Buf+sr.readline);
end;
sr.Close;
sr.Free;
end;

var
opendialog:TOpendialog;
path:string;
begin
cs:=TCriticalSection.Create;
opendialog:=TOpendialog.Create(nil);
if opendialog.Execute then
path:=opendialog.FileName;
opendialog.Free;
TThread.CreateAnonymousThread(
procedure
begin
FRead(path);
end).Start;
end.

The same thing, if you remove the TThread.CreateAnonymousThread wrapper, then everything works !

Alexey-T commented 1 week ago

@user4martin Can you reproduce it under fpc, please?

Uefi1 commented 1 week ago

@User4martin Can you reproduce it under fpc, please?

Sorry I don't have FPC =(

Uefi1 commented 1 week ago

Hi, it worked like this:

program Project2;

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.Classes, System.SyncObjs, VCL.Dialogs, Regexpr;

function pars(const Reg: TRegExpr; const text: string):string;
begin
Result:='';
if Reg.Exec(text) then
repeat
writeln(Reg.Substitute('$1'));
until not Reg.ExecNext;
end;

procedure FRead(const F:string);
const
BUFSIZE=1024*8;
var
Regex: TRegExpr;
sr:TStreamReader;
Buf: array [0..BUFSIZE] of Char;
begin
Regex:=TRegExpr.Create('([a-z]+)');
Regex.Compile;
sr:=TStreamReader.Create(F, TEncoding.Default, True);
while not sr.EndOfStream do begin
TThread.Synchronize(nil,
procedure begin
sr.ReadBlock(@Buf, 0, BUFSIZE);
end);
pars(Regex, Buf+sr.readline);
end;
sr.Close;
sr.Free;
end;

var
opendialog:TOpendialog;
path:string;
begin
opendialog:=TOpendialog.Create(nil);
if opendialog.Execute then
path:=opendialog.FileName;
opendialog.Free;
TThread.Queue(nil,
procedure
begin
FRead(path);
end);
end.

The problem is that synchronization freezes the form and there is no point in such code

User4martin commented 1 week ago

@User4martin Can you reproduce it under fpc, please?

will check in a couple of days....

User4martin commented 1 week ago

I get a buffer overflow error: Exception class ERangeError with message 'Input buffer exceeded for DestinationIndex = 0, Count = 18432'

If you get a range check, then you should be able to get a stacktrace?

Also the error sounds (I have not checked) like it is thrown be the thread.

You could single step the code, and check if the issue happens while reading the thread, or while running regex.


The synchronize may or may not be a coincidence. It might be just hiding the issue (or not)....

I also have to check what happens if you pass an "array of char" to a string param. Or rather when you "+" join it with the ReadLine. Especially, if maybe the prior read did not fill the entire array.

Have you checked in the debugger, if the content of the string "text" in "pars" is what you expect?

Uefi1 commented 1 week ago

I get a buffer overflow error: Exception class ERangeError with message 'Input buffer exceeded for DestinationIndex = 0, Count = 18432'

If you get a range check, then you should be able to get a stacktrace?

Also the error sounds (I have not checked) like it is thrown be the thread.

You could single step the code, and check if the issue happens while reading the thread, or while running regex.

The synchronize may or may not be a coincidence. It might be just hiding the issue (or not)....

I also have to check what happens if you pass an "array of char" to a string param. Or rather when you "+" join it with the ReadLine. Especially, if maybe the prior read did not fill the entire array.

Have you checked in the debugger, if the content of the string "text" in "pars" is what you expect?

Yes, if you read TStreamReader.ReadLine on one line, then no error occurs, the error occurs only when reading with a buffer TStreamReader.Readblock or TStreamReader.Read

Uefi1 commented 1 week ago

As if it wants to work like this, it displays the first lines, but still closes immediately:

program Project2;

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.Classes, System.SyncObjs, VCL.Dialogs, Regexpr;

const
BUFSIZE=1024*16;

var
cs:TCriticalSection;
sr:TStreamReader;
Buf: array [0..BUFSIZE] of Char;

function pars(const Reg: TRegExpr; const text: string):string;
begin
Result:='';
if Reg.Exec(text) then
repeat
cs.Enter;
writeln(Reg.Substitute('$1'));
cs.Leave;
until not Reg.ExecNext;
end;

procedure FRead(const F:string);
var
Regex: TRegExpr;
begin
Regex:=TRegExpr.Create('([a-z]+)');
Regex.Compile;
while not sr.EndOfStream do begin
sr.ReadBlock(@Buf, 0, BUFSIZE);
pars(Regex, Buf+sr.readline);
end;
//sr.Close;
//sr.Free;
end;

var
opendialog:TOpendialog;
path:string;
begin
cs:=TCriticalSection.create;
opendialog:=TOpendialog.Create(nil);
if opendialog.Execute then
path:=opendialog.FileName;
opendialog.Free;
sr:=TStreamReader.Create(TFileStream.Create(path, fmsharedenynone), TEncoding.Default, True);
TThread.CreateAnonymousThread(
procedure
begin
FRead(path);
end).Start;
end.
User4martin commented 1 week ago

I don't have Delphi either, and I can't compile the code.

But I searched TRegExpr for the error Input buffer exceeded => There is no such text in it. At least not in the latest version/commit. (And I could not find any such error in past versions)

So you either have an older version (and it is unknown if the error actually still exists), or the error happens somewhere else.

User4martin commented 1 week ago

Yes, if you read TStreamReader.ReadLine on one line, then no error occurs, the error occurs only when reading with a buffer TStreamReader.Readblock or TStreamReader.Read

Sounds like you run modified code to test that => that does not help.

What happens when you

Can you step into to the line if Reg.Exec(text) then?

If yes, then what is the stacktrace in TRegExpr?


Also: Are you testing with the latest version of TRegExpr? Have you downloaded it from this GitHub repo? (In or after July 2024?)

Uefi1 commented 1 week ago

I don't have Delphi either, and I can't compile the code.

But I searched TRegExpr for the error Input buffer exceeded => There is no such text in it. At least not in the latest version/commit. (And I could not find any such error in past versions)

So you either have an older version (and it is unknown if the error actually still exists), or the error happens somewhere else.

I moved the buffer to a global variable and the program worked !

const BUFSIZE=1024*16;

var cs:TCriticalSection; Buf: array [0..BUFSIZE] of Char;

......

The only problem is that the buffer will probably need to be cleaned in a loop =) Although in theory it should work in a local variable !

User4martin commented 1 week ago

I moved the buffer to a global variable and the program worked !

const BUFSIZE=1024*16;

Non of the "workarounds" you described should affect the TRegExpr code, as you are calling it with text from the param const text: string. I am pretty sure at this point that the error is caused outside TRegExpr (but the only way to know is if you provide the stack trace).

Also seeing how your seemingly random changes hide the issue, I am certain it likely in TStreamReader (or the way you call it). Or maybe (though I find it unlikely) in the fact that your param is "const" (I don't know how exactly the nitty gritty of this this is implemented in Delphi...,)

Btw... You have while not sr.EndOfStream do begin and that covers the ReadBuff. But then you call ReadLine, and there is no check if ReadBuff had reached the end....


Anyway, as I said: I do not have Delphi, I can not help you further on the TStreamReader related points above.

If (and 99% likely: only if) you

Only then may I be able to help further.

Unfortunately, otherwise I do not have the means to find the issue.

About your modified versions => all of those code modifications that you made can have any number of unknown side effects (and those may be very specific to the exact Version of Delphi that you use). So I have no way to know how, why the "hide" the issue. Therefore they do not in any way tell me where exactly the issue is.