Closed kinto-b closed 3 years ago
All functions in stringi convert their outputs to UTF-8.
Bytewise, x
and y
are not identical, because you are probably working in a non-UTF-8 native locale (refer to stringi::stri_info(FALSE)
).
I would say this is rather a problem with the base R functions; (see the draft of a paper on stringi https://stringi.gagolewski.com/_static/vignette/stringi.pdf for more details).
Also, consider calling iconv(x, "", "utf-8") ?
I see what you mean:
x <- c("xáx", "xöx", "xÉx", "xxáxx", "xxöxx", "xxÉxx")
y <- x
x <- stringi::stri_trim_both(x)
identical(
iconv(x, "utf-8", "utf-8"),
iconv(y, from = "ISO-8859-1", to = "utf-8")
)
#> [1] TRUE
I would say this is rather a problem with the base R function
Fair enough. I suppose the solution for me is to avoid mixing base string manipulation functions with stringi
functions or else to be explicit about the encoding.
Thanks!
Exactly, they are made to serve as replacements (with fixes) of the base ones.
PS You can also test with all(x == y)
.
Created on 2021-04-09 by the reprex package (v1.0.0)
I think the issue might be related to R4 as a colleague who had yet to update did not encounter the same issue.