Open mayeulk opened 1 month ago
Thanks for this! However your are barking up the wrong tree. The culprit is base::gregexpr(), which apparently is not aware of local traditions beyond the english language... ;-)
Note the following:
gregexpr("\\b\\W+\\b", "first all next?", perl = TRUE)[[1]]
[1] 6 10
attr(,"match.length")
[1] 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
gregexpr("\\b\\W+\\b", "first àll next?", perl = TRUE)[[1]]
[1] 6 10
attr(,"match.length")
[1] 2 1
gregexpr("\\b\\W+\\b", "first àll nèxt?", perl = TRUE)[[1]]
[1] 6 10 12
attr(,"match.length")
[1] 2 1 1
As far as I see, we cannot circumvent this behaviour. May I ask you to place this directly in the R-Bugs-list?
Strings with accents are note handled correctly. The ellipses to the (French) phrase "Action à réaliser", with maxlen=14, should be "Action à ..." When the first word not to be printed (here: "réaliser") has an accent, then this word is partly printed (up to the last accented letter, included).
Below, only the output for "Action a realiser" is correct (but "Action a realiser" is not correct French).
`
`
Tested with Package DescTools version 0.99.54, on R 4.4.0, Kubuntu 24.04 (UTF-8)