gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

stri_extract_all stops regex matching at newline #280

Closed atajti closed 7 years ago

atajti commented 7 years ago

After running the following code snippet, I expect to get "ing\nof inter":

tst_str <- "some string\nof interest"
stringi::stri_extract(tst_str, regex="ing.*ter")

However I get NA as result. After modifying to regex="ing.*", I expect "ing\nof interest" as result, however it gives "ing".

It also seems to be the case with

stringi::stri_replace(tst_str, "000", regex="ing.*rest")
stringi::stri_replace(tst_str, "000", regex="ing.*")

sessionInfo() R version 3.3.3 RC (2017-02-27 r72279) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=hu_HU.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=hu_HU.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=hu_HU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=hu_HU.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] tools_3.3.3 stringi_1.1.5

gagolews commented 7 years ago

Hi, RTM, see ?stringi::stri_opts_regex, HTH.

atajti commented 7 years ago

My bad, thanks for the pointer.