gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
300 stars 45 forks source link

Behavior of `stri_sub()` when `from = 0` #494

Closed UchidaMizuki closed 1 year ago

UchidaMizuki commented 1 year ago

stri_sub(string, from, to = -1) is often used to get the end of a string. As shown in the following reprex, if from = -2 or from = -1, the last two or one character can be obtained, respectively.

However, when from = 0, all strings are retrieved, not just "". (I think this is because zeros are treated as positive numbers.)

This somewhat unpredictable behavior causes the following bug in stringr::str_trunc(). https://github.com/tidyverse/stringr/issues/512

I think we need a function to get the end of the string or some option for stri_sub().

library(stringi)

stri_sub("xyz", -2, -1)
#> [1] "yz"
stri_sub("xyz", -1, -1)
#> [1] "z"
stri_sub("xyz", 0, -1)
#> [1] "xyz"

Created on 2023-06-26 with reprex v2.0.2

gagolews commented 1 year ago

This is a boundary case, this behaviour is intended. It if for compatibility with stri_sub<- (the replacement version of the operator), which allows prepending a substring at the start of a given string. In other words, think of 0 as 1.

UchidaMizuki commented 1 year ago

Thanks. I see that special care is needed to get the tail of the string.