gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

stri_split_regex() - not working for \\p{Z} ?? #327

Closed kbenoit closed 6 years ago

kbenoit commented 6 years ago

Maybe I'm daft here, but why does the first one not work?

> stringi::stri_split_regex("one\ntwo\tthree", "\\p{Z}+")
[[1]]
[1] "one\ntwo\tthree"

> stringi::stri_split_regex("one\ntwo\tthree", "\\p{WHITE_SPACE}+")
[[1]]
[1] "one"   "two"   "three"
gagolews commented 6 years ago

Hi Ken, see Table 12 in http://www.unicode.org/reports/tr44/tr44-21.html

http://www.fileformat.info/info/unicode/char/000A/index.htm

newline is \p{Cc}

gagolews commented 6 years ago

http://www.fileformat.info/info/unicode/char/000A/index.htm