andgineer / TRegExpr

Regular expressions (regex), pascal.
https://regex.sorokin.engineer/en/latest/
MIT License
174 stars 63 forks source link

(FPC > 3.2.3) Access Violation on windows with Unicode #255

Closed Hunter200165 closed 3 years ago

Hunter200165 commented 3 years ago

I have downloaded the most recent version of TRegExpr, included instead the old one and basic code that uses static strings started to break at no reason.

Version of FPC: Free Pascal Compiler version 3.3.1-r44222 [2020/02/22] for x86_64

Problem/Minimal code to reproduce:

program regex_fault;

{$Mode ObjFPC}
{$H+}

uses 
    SysUtils,
    regexpr;

{ Code borrowed from FPC/Lazarus wiki }
procedure DumpExceptionCallStack(E: Exception);
var I: Integer;
    Frames: PPointer;
    Report: string;
begin
    Report := 'Program exception! ' + LineEnding + 'Stacktrace:' + LineEnding + LineEnding;
    if E <> nil then begin
        Report := Report + 'Exception class: ' + E.ClassName + LineEnding + 'Message: ' + E.Message + LineEnding;
    end;
    Report := Report + BackTraceStrFunc(ExceptAddr);
    Frames := ExceptFrames;
    for I := 0 to ExceptFrameCount - 1 do
        Report := Report + LineEnding + BackTraceStrFunc(Frames[I]);
    WriteLn(Report);
end;    

var Reg: TRegExpr;
begin 
    Reg := TRegExpr.Create('h');
    try 
        try 
            Reg.InputString := 'hello';
            WriteLn(Reg.ExecPos(1));
            WriteLn(Reg.MatchPos[0]);
        finally 
            Reg.Free;
        end;
    except on E: Exception do 
        DumpExceptionCallStack(E);
    end;
end.

There call to Reg.ExecPos(1) on line

WriteLn(Reg.ExecPos(1));

Raises exception (Access Violation) with such stacktrace:

Stacktrace:
Exception class: EAccessViolation
Message: Access violation
  $0000000100021903  REGNEXT,  line 4663 of regexpr.pas
  $000000010001C23B  TAIL,  line 2230 of regexpr.pas
  $000000010001C2BC  OPTAIL,  line 2256 of regexpr.pas
  $000000010001D766  PARSEREG,  line 2996 of regexpr.pas
  $000000010001D3D6  COMPILEREGEXPR,  line 2854 of regexpr.pas
  $000000010001C205  COMPILE,  line 2176 of regexpr.pas
  $00000001000237D2  EXECPRIM,  line 5519 of regexpr.pas
  $00000001000233E5  EXECPOS,  line 5430 of regexpr.pas
  $00000001000019C4  main,  line 33 of regex_fault.pas
  $0000000100001A56  main,  line 41 of regex_fault.pas
  $000000010000FC60
  $00000001000016E0
  $00007FFB38C37034
  $00007FFB39B82651

Maybe I miss something and doing something illegal, but I cannot see the problem here, thanks

Hunter200165 commented 3 years ago

Just tested - issue does not persist on Delphi (at least 10.3 works)

Alexey-T commented 3 years ago

Just tested- on the not so ancient FPC I see no bug.

user@PC:~/_re_bg$ ./re_bug 
TRUE
1
user@PC:~/_re_bg$ 

You use FPC which is 1.5 years old

Hunter200165 commented 3 years ago

Thought about that and trying to update the compiler now, thank you!

Hunter200165 commented 3 years ago

I suppose issue can be closed now if the problem is not reproducible on newer compiler versions?

Alexey-T commented 3 years ago

I suppose yes...

Hunter200165 commented 3 years ago

Hello, sorry for reopening issue,

I installed a fresh compiler on another machine, which never had fpc before (latest trunk from fpcupdeluxe: 3.3.1-master-0-g0d3341b504 [2021/07/23] for x86_64) and there is also access violation.

I tested out fpc on linux (through wsl, just installed through the apt-get install fpc) and program did not crash, although the version of compiler on linux is extremely old (3.0.4+dfsg-23 [2019/11/25] for x86_64)

However, I also noticed one fact - when not using unicode, unit starts to work under win64; When unicode is there - it crashes during compilation of regex expression (any expression, because it gets invalid address in the RegNext function and tries to dereference it).

It is extremely weird issue, because I tried to compile it and run under delphi 10.3, and it got no exceptions here and it worked perfectly.

It would be great, if someone could test it out on win64; And if I can provide any other information - I will be happy to help

Alexey-T commented 3 years ago

Can you attach the test project, with the changed regexpr.pas (with additional 2 pascal files) in its folder? With

uses regexpr in './regexpr.pas';
Hunter200165 commented 3 years ago

I am sorry - changed regexpr.pas? I downloaded it from this repo and only changed unicode directive (I just undefined it)

Alexey-T commented 3 years ago

Ok, without regexpr.

Alexey-T commented 3 years ago

Also test on this project: I see it works OK

program project1;

uses sysutils, regexpr in './regexpr.pas';

var
  re: TRegExpr;

begin
  re:= TRegExpr.Create;
  re.Expression:= 'h';
  re.InputString:= 'hello';
  re.Compile;
  if re.Exec() then
    writeln('found '+Inttostr(re.MatchPos[0]));
  re.Free;
end.

Lazarus 2.3.0 r65368M FPC 3.2.1 x86_64-linux-gtk2

Hunter200165 commented 3 years ago

regex_fault-unicode.zip regex_fault-ansi.zip

Only difference made in regexpr.pas is 120 line: {$UNDEF Unicode} (It is commented in unicode and uncommented in ansi).

And I know that it compiles and executes without problem on linux (I also tested it with ancient fpc under wsl), but two machines with win64 with brand new trunk compiler version fails the same - with absolutely same tracebacks

Hunter200165 commented 3 years ago

I am sorry if there was not need to include binary exe, because it makes zip 1MB in size

Alexey-T commented 3 years ago

Ah, win64! interesting. will test.

Hunter200165 commented 3 years ago

I tried to debug it and see what actually happens there - but it is rather hard to understand the whole logic when reading code for the first time, sorry. I saw that it tries to dereference string data (basically invalid pointer access), but I could not find the source of this weird behavior

Alexey-T commented 3 years ago

Now I am on Win64 Lazarus 2.3.0 r65368 FPC 3.2.3 i386-win32-win32/win64 I tried bothj win32+win64.

No fail. your app writes to console: TRUE 1 5

Check that your IDE calls the correct new FPC.

Alexey-T commented 3 years ago

C:\Users\user>C:\fpcupdeluxe\fpcupdeluxe\fpc\bin\i386-win32\fpc.exe Free Pascal Compiler version 3.2.3-r49447 [2021/06/09] for i386

Hunter200165 commented 3 years ago

Check that your IDE calls the correct new FPC.

Most of times I build via cmd ppcx64 -B -gl regex_fault.pas, so there is no possibility for an old compiler to be there instead of new one (at least new compiler output is colorful)

And also - I compiled it with ppcx64 after installing fpc on the extremely fresh system (which had not fpc before). Therefore I have no clue about what is happening

Hunter200165 commented 3 years ago

I am going to investigate it a bit deeper and try some variants; Will comment this issue if I find any other info

Alexey-T commented 3 years ago

BTW, what if you run the 'test/test_fpc.lpi' project (included in Github)- does it show all 40+ tests ok?

Hunter200165 commented 3 years ago

I will test it now

Hunter200165 commented 3 years ago

image It crashed on CompileRE('1') (769 line, it is visible on traceback)

It is freshly downloaded repo with first time started lazarus

Hunter200165 commented 3 years ago

image And as expected - placing {$Undef Unicode} directive makes it work. Extremely weird

Hunter200165 commented 3 years ago

Also - tried to cross-compile regex_fault.pas to i386-win32 (I thought it may be bound to pointer size somehow) - no luck, the same exception

Alexey-T commented 3 years ago

So you have the crash on win32 and win64, but I don't have it on both of them. (on real Windows 10).

Hunter200165 commented 3 years ago

I just do not know what to do and how to detect the problem, because:

Therefore I am extremely confused. Can you run .exe file which I uploaded in the archive when you asked for a project files?

Also - can you share .exe files produced by your compiler? I just want to know if it is a compiler strange behavior or system-dependent issue

I am sorry for being clumsy; I just really want to use this great library, but cannot do it because I really need unicode in my application

Hunter200165 commented 3 years ago

So you have the crash on win32 and win64, but I don't have it on both of them. (on real Windows 10).

Also, I did not really test win32, because I just used crosscompiler made by fpcupdeluxe (x64 win64 -> i386 win32) and ran produced exe under same win64 system (I do not really have 32 bit machines to test it out). So if it is issue with compiler - it might generate invalid code for both versions

Alexey-T commented 3 years ago

what if you make MININAL optimization in project proeprties? FPC has bugs for optimization levels O2 and higher. try O1 level or 'off' level.

Hunter200165 commented 3 years ago

Okay, will try. But if I compile using ppcx64 -B -gl regex_fault.pas does it really set any optimisation? (I just do not know what is default)

Alexey-T commented 3 years ago

Yes it may set some optimization (from default cfg)

Hunter200165 commented 3 years ago

Tried -O- and -O1 - still the same; I can provide screenshots of compilation, maybe they can be useful somehow?

Alexey-T commented 3 years ago

regexpr-crash2.zip

Here is my compiled .exe with .lpi lazarus project.

Alexey-T commented 3 years ago

Pls try to compile it from Lazarus IDE, ie open the .lpi project and just compile it. (Lazarus overrides default FPC opts).

Hunter200165 commented 3 years ago

Okay, give me a moment

Hunter200165 commented 3 years ago

Oh, I see {$UNDEF Unicode} in regexpr.pas; Could you compile without undef please?

image This is result of compilation and execution of your .lpi project, but without undef; Also - it uses i386-win32 target which is done though cross compilation

image And this is result of execution when I set x64-win64 target in project options

Alexey-T commented 3 years ago

Removed that 'undef unicode'. made 2 .exe, x32/64. All work ok. regexpr-crash2.zip

Hunter200165 commented 3 years ago

Yeah, your binaries work

So I guess it is compiler generating invalid code somehow?

Alexey-T commented 3 years ago

It seems so.

Alexey-T commented 3 years ago

In the 'project options' in Laz, we have the option 'use standard compiler config file (fpc.cfg)'. maybe it affects? it is checked here.

Hunter200165 commented 3 years ago

Will check now, just one more time reinstalling compiler, maybe it is going to help

Hunter200165 commented 3 years ago

Just tested it - really, without using default conf file it compiles and works! Now I will see which option is making it fail

Hunter200165 commented 3 years ago

Wait, no (I saw that I executed ppcx64 with no -n attribute and doubted; then I check Use default config file in lazarus just to make sure and it worked), it is because I reinstalled compiler to the 3.2.2 version (I noticed you have been using 3.2.3, but it is not in fpcupdeluxe).

So now it works. Extremely weird issue.

I may suggest you to update to latest trunk version and see what your compiler will output; Maybe something was broken in latest releases?

Thank you very much! You helped me a lot really, I would never think that older compiler may be a solution, but in my case - it is so

Alexey-T commented 3 years ago

Can you pls

Hunter200165 commented 3 years ago

I am not sure, if it is a compiler issue really, or strange os/platform dependent issue, so I am not ready to report the issue.

Will test second machine

Alexey-T commented 3 years ago

Then pls discuss it before, at https://forum.lazarus.freepascal.org/index.php?board=62.0

Hunter200165 commented 3 years ago

Also, should I rename issue to something like (FPC > 3.2.3) Access violation on windows with Unicode so if people experience the same problem they will be able to find solution?

Alexey-T commented 3 years ago

You can do it, no problem

Hunter200165 commented 3 years ago

Thank you very much again, have a nice day!