Closed aaronrudkin closed 4 years ago
This is by-design, and has been well-documented. It is a feature: Simply put, do not write problematic regexes (and problematic code in general). :)
See "Performance Tips" in http://userguide.icu-project.org/strings/regexp.
My recent paper/tutorial available at https://stringi.gagolewski.com/_static/vignette/stringi.pdf mentions this as well - see p. 39 and the time_limit
or stack_limit
option in stri_opts_regex()
Also, refer to http://qinwenfeng.com/re2r_doc/ and https://github.com/qinwf/re2r for discussion.
I figured as much -- time_limit/stack_limit are at least a little bit what I was looking for. And I now see why I missed it in the docs (because it's implemented via the separate options function).
Thank you for responding here. I will respond as well in the up-stream issue about exposing that functionality via the wrapper interface.
I wrote an apparently problematic regex. I can see from running some external regex debuggers that the regex has a loop built into it which is nasty and hard to resolve. Some online debuggers say it takes over a million steps to get an answer. My fault. Not looking for regex help.
But this alerted me to the fact that in this pathological case, stri_detect takes a very long time to fail, more than 30 seconds on my desktop (Ryzen 7 3700 at 4.1Ghz) -- compared to grepl (200 microseconds). I am assuming the cause is that grepl internally decide a certain step count before they will give up on attempting to resolve and then give up, but stri_detect doesn't? That's just speculation on my part, I didn't go down the C rabbit hole. Oddly, neither produces an error. Neither a timeout error nor a parsing error.
I guess in order of preferences, mine would be:
Best: Have stri_detect error quickly when it reaches a regex it has trouble parsing and tell me how to fix it (yeah, right!) Good: Have stri_detect's timeout or termination behaviour more closely match grepl's OK: Add an argument to allow the user to terminate execution after a certain number of steps
The example below was generated using R 4.0.2 on Windows, stringi 1.5.3, bench 1.1.1. I've included two examples: one that is easily resolved by both (stri_detect beats grepl!) and then the pathological case.
I also filed this downstream with https://github.com/tidyverse/stringr/issues/350