Closed GoogleCodeExporter closed 9 years ago
This is with the current source checkout of regex.
Note that changing the test to "import regex; flags = regex.FULLCASE |
regex.UNICODE; pat = regex.compile(r'xxx', flags=flags); raw='yxxx'; print
([x.group() for x in pat.finditer(raw)]); print (pat.findall(raw))"
Which keeps a reference to the string being searched on, causes both finditer
and findall to fail on windows 32 and no change in linux 64.
Original comment by adse...@calibre-ebook.com
on 31 Dec 2013 at 9:14
Using unicode string literals also does not affect the failure. (I am testing
on python 2.7)
Original comment by adse...@calibre-ebook.com
on 31 Dec 2013 at 9:16
Passing in the IGNORECASE flag causes the matching to work, presumably because
it causes a memcpy?
Original comment by adse...@calibre-ebook.com
on 31 Dec 2013 at 9:36
Similarly, passing in the REVERSE flag also causes it to work.
Original comment by adse...@calibre-ebook.com
on 31 Dec 2013 at 9:45
It looks like req)pos and req_end are not initialized in state_init_2()
I am still testing but adding
state->req_pos = -1;
in state_init_2() fixes my test case. However, I dont understand the code well
enough. The bug happens when the value state->req_pos gets happens to be 0.
Is -1 the right value to initialize it to. What about req_end?
Original comment by adse...@calibre-ebook.com
on 31 Dec 2013 at 12:04
The following diff fixes all instances of this bug that I was able to
reproduce. However, I do not know if the fix is correct or complete.
diff --git a/src/regex/_regex.c b/src/regex/_regex.c
index 43f2135920..4418ccd817 100644
--- a/src/regex/_regex.c
+++ b/src/regex/_regex.c
@@ -13232,6 +13232,7 @@ Py_LOCAL_INLINE(BOOL) state_init_2(RE_State* state,
PatternObject* pattern,
state->groups = NULL;
state->repeats = NULL;
+ state->req_pos = -1;
state->visible_captures = visible_captures;
state->match_all = match_all;
state->backtrack_block.previous = NULL;
Original comment by adse...@calibre-ebook.com
on 31 Dec 2013 at 12:14
You are correct. state->req_pos wasn't initialised.
Fixed in regex 2013-12-31.
Original comment by re...@mrabarnett.plus.com
on 31 Dec 2013 at 8:42
Original issue reported on code.google.com by
adse...@calibre-ebook.com
on 31 Dec 2013 at 8:55