PCRE2Project / pcre2

PCRE2 development is now based here.
Other
919 stars 191 forks source link

Caseless UTF/UCP causes infinite loop in 32-bit mode #350

Closed addisoncrump closed 12 months ago

addisoncrump commented 12 months ago

The pattern /[\x{ffffffff}]/caseless,ucp causes an infinite loop in 32-bit mode.

5239   │ if ((options & PCRE2_CASELESS) != 0)
5240   │   {
5241   │ #ifdef SUPPORT_UNICODE
5242   │   if ((options & (PCRE2_UTF|PCRE2_UCP)) != 0)
5243   │     {
...
5250   │     while ((rc = get_othercase_range(&c, end, &oc, &od,
5251   │              (xoptions & PCRE2_EXTRA_CASELESS_RESTRICT) != 0)) >= 0)

Since the options caseless and ucp are set, we call get_othercase_range with end set to \x{ffffffff}.

We then enter this loop:

5149   │ static int
5150   │ get_othercase_range(uint32_t *cptr, uint32_t d, uint32_t *ocptr,
5151   │   uint32_t *odptr, BOOL restricted)
5152   │ {
5153   │ uint32_t c, othercase, next;
5154   │ unsigned int co;
...
5160   │ for (c = *cptr; c <= d; c++)
5161   │   {
...
5176   │   }

Since d is a uint32_t and d = 0xffffffff, this loop cannot terminate without its inner branches matching (neither of which ever do). I'm not sure how to handle this case (I think the character needs to be checked for META_END but I'm not confident enough to just open a PR for this one :slightly_smiling_face: )

carenas commented 12 months ago

there is indeed a larger problem with the implementation of that function as you can see from our test suite with "latest" sanitizer

PhilipHazel commented 12 months ago

Fixed in afce00e by not trying to look for other cases for characters above the Unicode maximum value.