gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

allow setting time limit for regex matching #355

Closed gagolews closed 4 years ago

gagolews commented 5 years ago

ICU includes the ability to limit the time spent on a regular expression match. This is a good idea when running untested expressions from users of your application, or as a fail safe for servers or other processes that cannot afford to be hung.

gagolews commented 5 years ago

virtual void | setTimeLimit (int32_t limit, UErrorCode &status)

gagolews commented 4 years ago

Done:

gagolews@dionysus:~$ Rscript -e 'stringi::stri_detect_regex("AAAAAAAAAAAAAAAAAAAAAAAAAC", "(A+)+B", time_limit=2)'
Error in stringi::stri_detect_regex("AAAAAAAAAAAAAAAAAAAAAAAAAC", "(A+)+B",  : 
  [!NDEBUG: Error in stri_search_regex_detect.cpp:112] Maximum allowed match time exceeded. (U_REGEX_TIME_OUT)
Execution halted

gagolews@dionysus:~$ Rscript -e 'library(stringi); stri_detect_regex(stri_paste(stri_dup("A", 1000), "C"), "(A+)+B", stack_limit=2048)'
Error in stri_detect_regex(stri_paste(stri_dup("A", 1000), "C"), "(A+)+B",  : 
  [!NDEBUG: Error in stri_search_regex_detect.cpp:112] Regular expression backtrack stack overflow. (U_REGEX_STACK_OVERFLOW)
Execution halted