PCRE2Project / pcre2

PCRE2 development is now based here.
Other
919 stars 191 forks source link

Might be a problem found during the metamorphosis test #412

Closed Conspio closed 5 months ago

Conspio commented 5 months ago

To test the program, I used a regular expression to match itself. When the compiled regex is

const char *pattern = "\x2d\x32\x35\x7c\x72\xf3\xb8\xb2\xa3\x2f"
                      "\x35\x77\x87\x31\xff\x2d\x2e\x31\x78\x26"
                      "\x2a\x2e\x7c\x31\x2a\x28\x28\x28\x47\x29"
                      "\x29\x4a\x0a\x76\x00\x00\x7c\x2e\x3f\x2e"
                      "\x7c\xa9\xa9\xa9\xa9\x29\x7b\x38\x32\x35"
                      "\x7d\x00";

The number of matches in PCRE2 with jit enabled is 2. Match from 0 ~ 3 and 4 ~ 21. here is the repo: https://github.com/Conspio/poc-repo/blob/main/pcre2jit.cpp

However, the number of matches in PCRE2 without jit enabled is 1. Only match from 0 ~ 3. here is the repo: https://github.com/Conspio/poc-repo/blob/main/pcre2nojit.cpp

PhilipHazel commented 5 months ago

Please give more detail (to save me having to understand your program). When I try that pattern with the same string as the match subject, using pcre2test with or without JIT, it just matches the whole subject.

zyingp commented 5 months ago

屏幕截图 2024-06-05 151153

zyp@LAPTOP-TUSLHN7A:/mnt/c/Users/zying/Downloads$ g++ pcre2nojit.cpp -o pcre2nojit -lpcre2-8 zyp@LAPTOP-TUSLHN7A:/mnt/c/Users/zying/Downloads$ ./pcre2nojit start_offset: 3 count: 1 zyp@LAPTOP-TUSLHN7A:/mnt/c/Users/zying/Downloads$ g++ pcre2jit.cpp -o pcre2jit -lpcre2-8 zyp@LAPTOP-TUSLHN7A:/mnt/c/Users/zying/Downloads$ ./pcre2jit start_offset: 3 start_offset: 21 count: 2 zyp@LAPTOP-TUSLHN7A:/mnt/c/Users/zying/Downloads$

They are tested on WSL2-Ubuntu22.

The results are different for with-jit and without-jit versions. The two cpp files are similar except for the jit part.

PhilipHazel commented 5 months ago

I am not a C++ programmer. Nevertheless, I have spent some time and figured out what is going on. It is not a bug. In the non-JIT case, after the first match, the second attempt gives a "match limit exceeded" error - that is, a negative value for rc, and hence the loop stops. You do not test for errors, and hence it looks as if it has found just one match. JIT's use of the match limit counts somewhat differently, and it can apparently find the second match without hitting the limit.

zyingp commented 5 months ago

Thank you for your further investigation. I checked and confirmed that if we call pcre2_set_match_limit(match_context, 4294967295) to set a bigger limit, the non-jit case has the same matching results as the jit case.