gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

Can stri_sub<- keep the string as is if the replacement is NA? #267

Closed yutannihilation closed 7 years ago

yutannihilation commented 7 years ago

In base R, we sometimes find the combination of regexpr and regmathces<- useful to partially replace strings after some processing. Since regmatches<- do nothing when the index is -1, which means no match, we can use it without considering if there are any elements that are not matched. Here is an example:

text <- c("A1", "B1", "CC")
pattern <- "[A-Z][0-9]"

m <- regexpr(pattern, text)
m
#> [1]  1  1 -1
#> attr(,"match.length")
#> [1]  2  2 -1
#> attr(,"useBytes")
#> [1] TRUE

regmatches(text, m) <- sapply(regmatches(text, m), tolower)
text
#> [1] "a1" "b1" "CC"

I wonder if I can do similar things with stringi, using stri_locate and stri_sub<-. But, stri_sub<- replaces the original string with NA if the replacement for the element is NA.

library(stringi)

text <- c("A1", "B1", "CC")
pattern <- "[A-Z][0-9]"

m <- stri_locate(text, regex = pattern)
m
#>      start end
#> [1,]     1   2
#> [2,]     1   2
#> [3,]    NA  NA

stri_sub(text, m)
#> [1] "A1" "B1" NA

stri_sub(text, m) <- sapply(stri_sub(text, m), tolower)
text
#> [1] "a1" "b1" NA

omit_na = TRUE works when the from or to is NA, but it has no effect if the replacement is NA

text <- c("A1", "B1", "CC")

# the original string is saved if from is NA
stri_sub(text, from = c(2,2,NA), omit_na=TRUE) <- "0"
text
#> [1] "A0" "B0" "CC"

# all elements are replaced with NA
stri_sub(text, from = c(2,2,NA), omit_na=TRUE) <- NA
text
#> [1] NA NA NA

I feel it is better to ignore replacements with NA when omit_na = TRUE. Do you have any idea?

gagolews commented 7 years ago

I guess it's reasonable to assume that omit_na=TRUE should omit missing values in the replacement strings vector too. Thanks.

yutannihilation commented 7 years ago

Thanks for the reply! I've created a pull request for my practice of C++.