gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

stri_split_regex behavior for trailing separators #330

Closed hammer closed 6 years ago

hammer commented 6 years ago

I've posted a reprex at https://community.rstudio.com/t/tidyr-separate-rows-trailing-separators/16111.

Suppose I have a string where the separator occurs at the end of the string, e.g. s1 <- "1;2;3;". I'd like to get just "1" "2" "3" back, without an additional "". stri_split_regex(s1, pattern = ";", omit_empty = TRUE) works in this case, but suppose I want just "1" "" "3" in the case of `s2 <- "1;;3;". Then it doesn't work.

One way I thought might solve this problem is to use pattern = (;$|;). I thought that if the separator swallowed the end-of-line, an element would not be returned after it. Unfortunately, stri_split_regex(s1, pattern = "(;$|;)" still returns "1" "2" "3" "".

So, perhaps this edge case is not important, but I'm curious if you've thought about this issue of trailing separators and if it makes sense to change the behavior when the separator ends in $?

gagolews commented 6 years ago

[this is regex-specific, not relates to stringi]

I guess you can try with anchoring like https://www.regular-expressions.info/lookaround.html