In base R, we sometimes find the combination of regexpr and regmathces<- useful to partially replace strings after some processing. Since regmatches<- do nothing when the index is -1, which means no match, we can use it without considering if there are any elements that are not matched. Here is an example:
text <- c("A1", "B1", "CC")
pattern <- "[A-Z][0-9]"
m <- regexpr(pattern, text)
m
#> [1] 1 1 -1
#> attr(,"match.length")
#> [1] 2 2 -1
#> attr(,"useBytes")
#> [1] TRUE
regmatches(text, m) <- sapply(regmatches(text, m), tolower)
text
#> [1] "a1" "b1" "CC"
I wonder if I can do similar things with stringi, using stri_locate and stri_sub<-. But, stri_sub<- replaces the original string with NA if the replacement for the element is NA.
library(stringi)
text <- c("A1", "B1", "CC")
pattern <- "[A-Z][0-9]"
m <- stri_locate(text, regex = pattern)
m
#> start end
#> [1,] 1 2
#> [2,] 1 2
#> [3,] NA NA
stri_sub(text, m)
#> [1] "A1" "B1" NA
stri_sub(text, m) <- sapply(stri_sub(text, m), tolower)
text
#> [1] "a1" "b1" NA
omit_na = TRUE works when the from or to is NA, but it has no effect if the replacement is NA
text <- c("A1", "B1", "CC")
# the original string is saved if from is NA
stri_sub(text, from = c(2,2,NA), omit_na=TRUE) <- "0"
text
#> [1] "A0" "B0" "CC"
# all elements are replaced with NA
stri_sub(text, from = c(2,2,NA), omit_na=TRUE) <- NA
text
#> [1] NA NA NA
I feel it is better to ignore replacements with NA when omit_na = TRUE. Do you have any idea?
In base R, we sometimes find the combination of
regexpr
andregmathces<-
useful to partially replace strings after some processing. Sinceregmatches<-
do nothing when the index is-1
, which means no match, we can use it without considering if there are any elements that are not matched. Here is an example:I wonder if I can do similar things with stringi, using
stri_locate
andstri_sub<-
. But,stri_sub<-
replaces the original string withNA
if the replacement for the element isNA
.omit_na = TRUE
works when thefrom
orto
isNA
, but it has no effect if the replacement isNA
I feel it is better to ignore replacements with
NA
whenomit_na = TRUE
. Do you have any idea?