PCRE2Project / pcre2

PCRE2 development is now based here.
Other
921 stars 194 forks source link

vreverse broken with link-size != 2 (with JIT) #332

Closed carenas closed 1 year ago

carenas commented 1 year ago

as shown by failures on tests 2, 4 and 5 (when configured with --with-link-size=4).

link-size=3 also shows additional failures in test 1 (with JIT) which might be also related

PhilipHazel commented 1 year ago

I cannot get the interpreter to fail with link size = 3 or 4 though it does fail with JIT. Please can you give details of your interpreter failure?

carenas commented 1 year ago

My bad; I got my test mixed up and you are correct that there are no failures in the automated tests, neither in my synthetic tests that could be attributed to the interpreter. Had update the issue to reflect that, apologies again for the confusion.

zherczeg commented 1 year ago

There is a typo in the code, but fortunately it revealed a good question: https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_internal.h#L1782C1-L1782C1

The reverse use LINK_SIZE for storing the number of characters, while vreverse uses IMM2_SIZE. I would use the same, probably IMM2_SIZE.

PhilipHazel commented 1 year ago

I think I originally used LINK_SIZE for OP_REVERSE because I thought it might just be possible that someone with an enormous subject string would want to look behind more than 65535 characters when LINK_SIZE is greater than 2. For OP_VREVERSE there is a very small limit, which is probably why I used IMM2_SIZE. However, I don't mind changing OP_REVERSE if you think this should be done.

zherczeg commented 1 year ago

I would prefer consistency here. It is easy to forget about this difference in new code.

carenas commented 1 year ago

Note though that the current small limit that applies to OP_VREVERSE is a constrain coming from Perl, which might change in the future since "variable lookbehind" is still experimental AFAIK, so it might be better to change OP_VREVERSE to use LINK_SIZE instead of the other way around.

zherczeg commented 1 year ago

I am ok with any direction. Just use the same size.

PhilipHazel commented 1 year ago

I have changed OP_REVERSE to use IMM2_SIZE in b3dbe8d.

zherczeg commented 1 year ago

src/pcre2_printint.c

This change is not good, because it affects all the other opcodes:

    case OP_COND:
    case OP_SCOND:
    case OP_REVERSE:
    if (print_lengths) fprintf(f, "%3d ", GET2(code, 1));
PhilipHazel commented 1 year ago

Oops. Now fixed.

carenas commented 1 year ago

Sadly with the latest fixes, Test 6 is now failing if link-size==4, didn't check with 3, and don't recall this being the case before, but they are all lookbehinds so likely related.

zherczeg commented 1 year ago

I also mentioned it to Philip. Those are dfa tests. I hope we don't overload him.

PhilipHazel commented 1 year ago

I have fixed the DFA problem - I overlooked one case when changing OP_REVERSE to IMM2. Thank you for worrying about overloading me. There is no deadline for 10.43, so it will come when it comes. I will continue to work slowly through the issues as time permits - what will take time is #334 because I cannot remember enough about the way recursion works in the interpreter.

carenas commented 1 year ago

I think this was all a trick to force me to learn more about DFA, at least I know now why moving OP_REVERSE to IMM2 was a better path than the alternative I suggested, even if I didn't catch the overlooked change (and I did try, I even think I tried it as a shot in the dark and didn't help, but I probably just got my test mixed up as usual).

carenas commented 1 year ago

working fine as shown by manytests in CI