gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

stri_sprintf + extras #420

Closed gagolews closed 3 years ago

gagolews commented 3 years ago

sprintf takes field width in bytes, not Unicode code points

> cat(sprintf("%6s%6s%6s", "-", c("asc", "ąść", "abcdefg"), "-"), sep="\n")
     -   asc     -
     -ąść     -
     -abcdefg     -

current workaround:

> cat(sprintf("%6s%s%6s", "-", stringi::stri_pad(c("asc", "ąść", "abcdefg"), 6), "-"), sep="\n")
     -   asc     -
     -   ąść     -
     -abcdefg     -

Home-made version of stri_sprintf could also feature some extras, like #240 (PluralFormat) or #81 (RuleBasedNumberFormat) ?

gagolews commented 3 years ago

The same applies to the %s$% operator.

gagolews commented 3 years ago

Some examples:

stri_printf("%4s=%.3f", c("e", "e\u00b2", "\u03c0", "\u03c0\u00b2"),
    c(exp(1), exp(2), pi, pi^2))
##    e=2.718
##   e²=7.389
##    π=3.142
##   π²=9.870
x <- c("xxabcd", "xx\u0105\u0106\u0107\u0108",
    "\u200b\u200b\u200b\u200b\U0001F3F4\U000E0067\U000E0062\U000E0073\U000E0063\U000E0074\U000E007Fabcd")
stri_printf("[%10s]", x)  # minimum width = 10
## [    xxabcd]
## [    xxąĆćĈ]
## [    ​​​​🏴󠁧󠁢󠁳󠁣󠁴󠁿abcd]
stri_printf("[%-10.3s]", x)  # output of max width = 3, but pad to width of 10
## [xxa       ]
## [xxą       ]
## [​​​​🏴󠁧󠁢󠁳󠁣󠁴󠁿a       ]
stri_printf("[%10s]", x, use_length=TRUE)  # minimum number Unicode of code points = 10
## [    xxabcd]
## [    xxąĆćĈ]
## [​​​​🏴󠁧󠁢󠁳󠁣󠁴󠁿abcd]
# vectorization wrt all arguments:
p <- runif(10)
stri_sprintf(ifelse(p > 0.5, "P(Y=1)=%1$.2f", "P(Y=0)=%2$.2f"), p, 1-p)
##  [1] "P(Y=0)=0.71" "P(Y=1)=0.79" "P(Y=0)=0.59" "P(Y=1)=0.88" "P(Y=1)=0.94"
##  [6] "P(Y=0)=0.95" "P(Y=1)=0.53" "P(Y=1)=0.89" "P(Y=1)=0.55" "P(Y=0)=0.54"
# using a "preformatted" logical vector:
x <- c(TRUE, FALSE, FALSE, NA, TRUE, FALSE)
stri_sprintf("%s) %s", letters[seq_along(x)], c("\u2718", "\u2713")[x+1])
## [1] "a) ✓" "b) ✘" "c) ✘" NA     "e) ✓" "f) ✘"
# custom NA/Inf/NaN strings:
stri_printf("%+10.3f", c(-Inf, -0, 0, Inf, NaN, NA_real_),
    na_string="<NA>", nan_string="\U0001F4A9", inf_string="\u221E")
##         -∞
##     -0.000
##     +0.000
##         +∞
##         💩
##       <NA>
stri_sprintf("UNIX time %1$f is %1$s.", Sys.time())
## [1] "UNIX time 1621824515.023827 is 2021-05-24 12:48:35."
# the following do not work in sprintf()
stri_sprintf("%1$#- *2$.*3$f", 1.23456, 10, 3)  # two asterisks
## [1] " 1.235    "
stri_sprintf(c("%s", "%f"), pi)  # re-coercion needed
## [1] "3.14159265358979" "3.141593"
stri_sprintf("%1$s is %1$f UNIX time.", Sys.time())  # re-coercion needed
## [1] "2021-05-24 12:48:35 is 1621824515.027764 UNIX time."
stri_sprintf(c("%d", "%s"), factor(11:12))  # re-coercion needed
## [1] "1"  "12"
stri_sprintf(c("%s", "%d"), factor(11:12))  # re-coercion needed
## [1] "11" "2"