Closed User4martin closed 1 year ago
if #303 has #300 as its part, let's close #300 ?
fInputContinue: PRegExprChar; // pointer to first char of input string
- misleading comment.
Thanks!
Ptr := fInputStart + AOffset - 1;
fInputContinue := Ptr;
if I got it, you set fInputContinue even on failing execs, but we must set it only on successfull exec.
Please check the history, if entries are ok.
Sorry about the comment, copy and paste is still a thing ;)
test_dlg now shows "Not fixed op: CONTINUE_POS"
Pls also comment 2 existing UniqueString calls.
if https://github.com/andgineer/TRegExpr/pull/303 has https://github.com/andgineer/TRegExpr/pull/300 as its part, let's close https://github.com/andgineer/TRegExpr/pull/300 ?
If its fine to merge both in one, yes. (I.e. if you have no rule to merge those features separately)
Its fine to merge both in one. @andgineer is busy man - he won't be happy to dig all this
Also: it's good to add \G to official docs, also in this repo. pls add here: https://regex.sorokin.engineer/en/latest/regular_expressions.html#boundaries
Thanks!
Ptr := fInputStart + AOffset - 1; fInputContinue := Ptr;
if I got it, you set fInputContinue even on failing execs, but we must set it only on successfull exec.
This is set before the regex is run.
It is either the value given by the user re.Exec(MyOffset)
Or it is given by ExecNext
, which already called ExecPrim
with the offset from the last match (I have not changed that / the offset was stored on GrpBounds[0].GrpEnd[0]
and will be stored into fInputContinue
before we start the next match)
Pls also comment 2 existing UniqueString calls.
Done in ad665552d93c00d4bdc8de0ba3843bc47760564d
OP_CONTINUE_POS = Succ(OP_SUBCALL_LAST);
won't work for undefined UnicodeRE because
OP_SUBCALL_LAST =
{$IFDEF UnicodeRE}
TReOp(Ord(OP_SUBCALL) + RegexMaxGroups - 1);
{$ELSE}
High(REChar); // must fit to 0..255 range
{$ENDIF}
so, insert new opcode in the middle, after the OP_RECUR. value 49 is good. seems OP_OPEN (=50) is not needing the change.
Also: it's good to add \G to official docs, also in this repo. pls add here: https://regex.sorokin.engineer/en/latest/regular_expressions.html#boundaries
Done / not familiar with the markup used. Please check.
Also the section is called "Line Boundaries" and starts with".
any char" ? (anyway that is an issue of its own, if considered an issue )
oh, value 26 is good too!
OP_EOL2 = TReOp(25); // like OP_EOL but also matches before final line-break
OP_BSUBEXP = TREOp(28);
`\G`` zero-length match at the end pos of the previous match. For ExecNext and Exec(Offset)
here is docs for usual users of programs, so the ending . For ExecNext and Exec(Offset)
is not OK, remove it.
Also the section is called "Line Boundaries" and starts with". any char" ? (anyway that is an issue of its own, if considered an issue )
right. please remove that line from the topic.
`\G`` zero-length match at the end pos of the previous match. For ExecNext and Exec(Offset)
here is docs for usual users of programs, so the ending
. For ExecNext and Exec(Offset)
is not OK, remove it.
Ah, ok. I this case would this be good?
`\G`` zero-length match at the end pos of the previous match when using the g (global) modifier.
when using the g (global) modifier.
is not OK ending- \G works always, we don't check 'g' modifier.
when using the g (global) modifier.
is not OK ending- \G works always, we don't check 'g' modifier.Right, but when using the feature without writing the pascal code
does code check existing of 'g' modifier? no. so why to write this ending?
when using the g (global) modifier.
is not OK ending- \G works always, we don't check 'g' modifier.Right, but when using the feature without writing the pascal code
does code check existing of 'g' modifier? no. so why to write this ending?
I hit the send button by accident.
parasitic dot is not needed
Also the section is called "Line Boundaries" and starts with". any char" ? (anyway that is an issue of its own, if considered an issue )
right. please remove that line from the topic.
It then needs to go elsewhere.
For now I created a new issue, so it can be addressed on its own.
you did good work! I don't have powers today to make all these testings in the test-project.
Btw, I did test the new StrLPos, but since it is hardly going to change, I did not write a test-case for it.
But for peace of mind, this is what I tested
a := 'abcdefaxdef';
writeln(StrLPos( PRegExprChar(a), PRegExprChar('a'), 0,1 ) - PRegExprChar(a) ); // to short
writeln(StrLPos( PRegExprChar(a), PRegExprChar('a'), 1,1 ) - PRegExprChar(a) ); //ok
writeln(StrLPos( PRegExprChar(a), PRegExprChar('a'), 2,1 ) - PRegExprChar(a) ); // ok
writeln;
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ab'), 0 ,2 ) - PRegExprChar(a) ); // to short
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ab'), 1 ,2 ) - PRegExprChar(a) ); // to short
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ab'), 2 ,2 ) - PRegExprChar(a) ); // ok
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ab'), 7 ,2 ) - PRegExprChar(a) ); // ok
writeln;
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ax1'), 0 ,2 ) - PRegExprChar(a) ); // to short
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ax1'), 1 ,2 ) - PRegExprChar(a) ); // to short
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ax1'), 2 ,2 ) - PRegExprChar(a) ); // to short
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ax1'), 7 ,2 ) - PRegExprChar(a) ); // to short
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ax1'), 8 ,2 ) - PRegExprChar(a) ); // ok
writeln(StrLPos( PRegExprChar(a), PRegExprChar('ax1'), 9 ,2 ) - PRegExprChar(a) ); //ok
writeln;
// all below do not find anything
writeln(StrLPos( PRegExprChar(a), PRegExprChar('fd'), 0 ,2 ) - PRegExprChar(a) );
writeln(StrLPos( PRegExprChar(a), PRegExprChar('fd'), 1 ,2 ) - PRegExprChar(a) );
writeln(StrLPos( PRegExprChar(a), PRegExprChar('fd'), 6 ,2 ) - PRegExprChar(a) );
writeln(StrLPos( PRegExprChar(a), PRegExprChar('fd'), 7 ,2 ) - PRegExprChar(a) );
writeln;
Can we have 'strLpos' name in good casing? and formatting of StrLScan with indent=2spaces?
Thanks
while (count < len) and (P[count] <> #0) do
StrLScan fails on chr0. But regex engine handles chr0 OK, so if you can improve it..
great!
lines are not needed in StrLPos
if p = nil then
exit;
you replaced
if Pos(regMustString, fInputString) = 0 then Exit;
(check for NOT FOUND) to
if StrLPos(fInputStart, PRegExprChar(regMustString), fInputEnd - fInputStart, length(regMustString)) <> nil then
exit;
(check for FOUND?)
it's OK? I mean <> nil
(check for FOUND?) it's OK? I mean <> nil
I tried to make a test case for it, but I could not make a pattern that would trigger FLAG_SPECSTART
.
\w*abcd
ExecPrim's SlowChecks is False in some wrappers
function TRegExpr.Exec(const AInputString: RegExprString): boolean;
begin
InputString := AInputString;
Result := ExecPrim(1, False, False, False);
end; { of function TRegExpr.Exec
-------------------------------------------------------------- }
{$IFDEF OverMeth}
function TRegExpr.Exec: boolean;
var
SlowChecks: boolean;
begin
SlowChecks := fInputEnd - fInputStart < fSlowChecksSizeMax;
Result := ExecPrim(1, False, SlowChecks, False);
end; { of function TRegExpr.Exec
-------------------------------------------------------------- }
function TRegExpr.Exec(AOffset: integer): boolean;
begin
Result := ExecPrim(AOffset, False, False, False);
end; { of function TRegExpr.Exec
-------------------------------------------------------------- }
{$ENDIF}
Based on this info, can you please add testcase with regex \w*abcd
and SlowChecks=True , testcase to run with regMustString.
Thank you
fInputContinue. when it is custom (not automatic)? ExecPrim has
Ptr := fInputStart + AOffset - 1;
fInputContinue := Ptr;
and all Exec* funcs call ExecPrim. so, in all cases - fInputContinue is set to automatic value? so, even if
procedure TRegExpr.SetInputRange(AStart, AEnd, AContinueAnchor: PRegExprChar);
begin
fInputString := '';
fInputStart := AStart;
fInputEnd := AEnd;
fInputContinue := AContinueAnchor;
end;
is called, ExecPrim overrides custom fInputContinue value?
fInputContinue. when it is custom (not automatic)? ExecPrim has
Ptr := fInputStart + AOffset - 1; fInputContinue := Ptr;
and all Exec* funcs call ExecPrim. so, in all cases - fInputContinue is set to automatic value? so, even if
"Custom" means AOffset
. => AOffset
can be given by the user (the person writing the Pascal code controlling the regex).
The user can give it via:
function Exec(AOffset: integer): boolean; overload;
function ExecPos(AOffset: integer {$IFDEF DefParam} = 1{$ENDIF}): boolean;
And for modifier "g", if you call ExecNext
then GrpBounds[0].GrpEnd[0]
is passed as AOffset
.
\G
is the position from which matching was continued (or started).
procedure TRegExpr.SetInputRange(AStart, AEnd, AContinueAnchor: PRegExprChar); begin fInputString := ''; fInputStart := AStart; fInputEnd := AEnd; fInputContinue := AContinueAnchor; end;
is called, ExecPrim overrides custom fInputContinue value?
SetInputRange
is private, and is only called on fHelper used for look-behind. It follows the settings on the main instance.
procedure TRegExpr.SetInputRange(AStart, AEnd, AContinueAnchor: PRegExprChar);
begin
fInputString := '';
fInputStart := AStart;
fInputEnd := AEnd;
fInputContinue := AContinueAnchor;
end;
if we avoid setting fInputContinue here ^^^, what do we miss?
Normally if I continue to match (ExecNext / Exec(AOffset) the pattern can match at AOffset or anywhere after AOffset.
If you have a pattern, that you use to parse some text \s*(var|const|type
then this can skip none matching content.
If you don't wont it to skip anything, you do \G\s*(var|const|type
( ^
wont do, as it is not the start of line, but the start of the continuation match)
if we avoid setting fInputContinue here ^^^, what do we miss?
(?<!\G(?:var|const|type).*)\s+(.*)
match only if the continuation start pos did not begin with var/const or type.
Or
(?<!(?:var|const|type).*\Gc*)\s+(.*)
match only if not BEFORE the continuation start pos there had been var/const or type.
yes, I got idea of these 2 regexes. but again, you didnot answer:
this line fInputContinue := AContinueAnchor;
is ignored, because the ExecPrim (called in Exec*) will overwrite the fInputContinue, yes?
yes, I got idea of these 2 regexes. but again, you didnot answer:
this line
fInputContinue := AContinueAnchor;
is ignored, because the ExecPrim (called in Exec*) will overwrite the fInputContinue, yes?
Ah, I see. I was wondering if I was missing something. The answer I gave was way to obvious.
So here is the answer.
No it does not overwrite it.
The only code calling it is
function TRegExpr.MatchAtOnePos(APos: PRegExprChar): boolean;
begin
...
fHelper.SetInputRange(APos - fHelperLen, APos, fInputContinue);
if fHelper.MatchAtOnePos(APos - fHelperLen) then
fHelper does not have a further helper, and goes to
function TRegExpr.MatchAtOnePos(APos: PRegExprChar): boolean;
begin
...
Result := MatchPrim(regCodeWork);
MatchPrim
skips ExecPrim
now I see; thanks. I forgot about that fHelper code.
@User4martin You are talent! thanks!
https://learn.microsoft.com/en-us/dotnet/standard/base-types/backtracking-in-regular-expressions Let's rename IDs:
FIsBactracingAtomic -> GrpBacktrackingIsAtomic savedBackracingAtomic -> savedBacktrackingIsAtomic
Increase the version to 1.161, and add History.txt info about this fix, please
Let's rename IDs: FIsBactracingAtomic -> GrpBacktrackingIsAtomic savedBackracingAtomic -> savedBacktrackingIsAtomic
Before I do, I think the new names are misleading at least... But mine aren't better.
This is not about the group being atomic (well by implication it is). But in an atomic group regular backtracking can happen before a first match is found.
What it should say is "The entire group is (to be) backtracked as one entity" => "The group is backtracked as a single atom"
So then it would be IsBacktrackingGroupAsAtom
(with/without F
not sure when that should be prefixed).
(and savedIsBacktrackingGroupAsAtom
)
So then it would be IsBacktrackingGroupAsAtom (with/without F not sure when that should be prefixed). (and savedIsBacktrackingGroupAsAtom)
Okay.
Comment this var
savedIsBacktrackingGroupAsAtom: boolean;
Pls change in test_dlg/unit1.pas this
if reg.IsFixedLength(op, i) then
ListDump.Items.Add('Fixed length: '+IntToStr(i))
else
ListDump.Items.Add('Not fixed op: '+reg.DumpOp(op));
to this
if reg.IsFixedLength(op, i) then
ListDump.Items.Add('Match has fixed len: '+IntToStr(i))
else
ListDump.Items.Add('Match has not fixed len because of opcode '+reg.DumpOp(op));
New test for 'bads' now produces exception, it is OK, but better comment this test to avoid exceptions in test-app.
See #297
This is based on #300 as some changes affect nearby lines and lead to conflicts if individually applied.
Once #300 is merged, only one commit should remain to be merged.