GsDevKit / GsDevKit_home

master GsDevKit project
http://gsdevkit.github.io/GsDevKit_home
MIT License
31 stars 36 forks source link

Stack overflow with certain regular expression #322

Closed dassi closed 3 years ago

dassi commented 3 years ago

(Not shure if this belongs into GsDevKit, or in some third party package, sorry for posting it here, I cant investigate further. Maybe someone knows the right place to put that issue. I think it's serious enough, since it crashes the process)

GemStone 3.4.5

This fails with an AlmostOutOfStack exception, looping inside some of the Rx* regular expression classes. If you remove the last some characters of the string, then it works.

This fails:

'""COMUNICARE"" = kommunizieren|""CONTROLLARE"" = kontrollieren|""DECORARE"" = dekorieren\n|} _§7jff_ +audio:d8lfwxli13hlenp9si7b9cmei+|} _§w9sx_ +audio:2872cxrdxiyao6efr4qvj0fru+|} _§61rb_ +audio:4ercg9qsmjasd4kdevlsjyhbv+\n|} _§iefw_ +audio:5nxc6v1l2l13zitxwsfgha1zn+|} _§fgsk_ +audio:f1b5gmjcaak5l7ozp35djz4ma+|} _§zjwe_ +audio:981m5637xwudbrlxi1wl8oj2z+\n|} _§hrgr_ +audio:3kl0dnosgfp1m9vjwmk8jeq9w+|} _§rxak_ +audio:9107ibl3t96zr7r193cct1kse+|} _§7rxe_ +audio:1h10wwqk0j2c5zs1moh3fvdh5+\n|} _§rcd8_ +audio:ck7i28k1a5is72g5ee2nc2njk+|} _§1hfn_ +audio:eabfny4hgoy0599uutyb5uxu6+|} _§a3za_ +audio:cryooys4xpyf05f3uq8nicyy4+\n|} _§cp52_ +audio:4smjzdhax5riwfr62wndeilqe+|} _§247q_ +audio:2ahjm7jafcg88j1wivpl8w01l+|} _§8rzb_ +audio:dz8b2qtjc6cbkg1o7wambxhmc+\n|} _§deqs_ +audio:csrvjtqwpk7nz4z1cy72q6pr9+|} _§1fal_ +audio:ikpxzvsro0oihmzdwgxh8tcz+|} _§8wbu_ +audio:dkw2n3u7n3fw4bi396m91ra4v+\n|""COPIARE"" = kopieren|""CONSUMARE"" = konsumieren|""COSTARE"" = kosten\n|} _§ltag_ +audio:5au0hpe1f7y4cl446wmgipx53+|} _§hta7_ +audio:eqtrlb9p93tjm9k786999n6w6+|} _§9psh_ +audio:f5adk6sld1no3vc8k7qi01e01+\n|} _§npf7_ +audio:bbxt1d95ivispl2076kuksalw+|} _§vpw3_ +audio:4ha1vca9u32vrko85pjl1v5jc+|} _§ievb_ +audio:cxhzbbz4wz0nhg2ao2eu142qf+\n|} _§dtsu_ +audio:2y7tg3ybu3054is5zk61ywfh5+|} _§xjwr_ +audio:73y3yktqf6wbyxh439f2zvkgt+|} _§qkmw_ +audio:3h0nef6wp9p7rzlsjnjev6eo3+\n|} _§ayh8_ +audio:eijjhp21tqr2ue40vdg1iiwkm+|} _§2lw3_ +audio:csids2zklcaqgb5do6b2ca1e6+|} _§19uj_ +audi' matchesRegexIgnoringCase: '^.*\S\_§[[:alnum:]]+\_.*$'

dalehenrich commented 3 years ago

@dassi, this is a fine place to report the issue ... I can triage the bug report and report it in the appropriate github project ...I'm kind of backed up at the moment, so I won't be able to get to this issue for a couple of days (at least).

If you are running in tODE, you can use the gs halt command to arrange to bring up a debugger BEFORE you run out of memory and then you can get a stack from the debugger and possibly debug the issue on your own (these are the same steps I would perform) ... if you get a stack, please include the stack in a comment. if you figure out a patch please include the patch in a comment and I would be willing to apply it to the appropriate project ...

dalehenrich commented 3 years ago

@dassi I am taking a look at this and my overall impression is that this isn't an infinite loop, but a feature of the Regex code ... I am not an expert in Regex (for example, I am unable to understand what your Regex expression is meant to do). However, it looks like the stack is alternating between RxmBranch >> matchAgainst: and RxmPredicate >> matchAgainst:. Here's the source for RxmBranch >> matchAgainst::

   matchAgainst: aMatcher
           "Match either `next' or `alternative'. Fail if the alternative is nil."

           ^(next matchAgainst: aMatcher)
 *                ^2                                                  *******
                   or: [alternative notNil
                           and: [alternative matchAgainst: aMatcher]]

and here's the source for RxmPredicate >> matchAgainst::

   matchAgainst: aMatcher
           "Match if the predicate block evaluates to true when given the
           current stream character as the argument."

           | original |
           original := aMatcher currentState.
           (aMatcher atEnd not 
                   and: [(predicate value: aMatcher next)
                           and: [next matchAgainst: aMatcher]])
 *                                    ^11                             *******
                   ifTrue: [^true]
                   ifFalse:
                           [aMatcher restoreState: original.
                           ^false]

the lines with * mark the call being made (from topaz) and it looks to me that the code in RxmPredicate >> matchAgainst: is looping on aMatcher next and that means that it is recursing character by character through your input string (which is 1503 characters long) ... if I do some math it looks like you'd need 3006 frames (2 per character) at a minimum to make it though your input string ... in my case, it'd gotten to character 1474 when it ran out of stack ... so you should bump up your stack size by increasing GEM_MAX_SMALLTALK_STACK_DEPTH in your gem.conf file ... I used GEM_MAX_SMALLTALK_STACK_DEPTH=1500; and the expression completed without any further errors.