gagolews / stringi

Fast and portable character string processing in R (with the Unicode ICU)
https://stringi.gagolewski.com/
Other
304 stars 44 forks source link

Consider stricter recycling #417

Closed hadley closed 3 years ago

hadley commented 3 years ago

I think this should be an error:

stringi::stri_detect_fixed(letters[1:2], letters)
#>  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25] FALSE FALSE

Created on 2021-04-18 by the reprex package (v2.0.0)

The tidyverse now only ever recycles vectors of length 1.

gagolews commented 3 years ago

In stringi, on the other hand, I have always wanted to stick to full recycling wrt all meaningful arguments, just like in base R:

> paste(letters[1:2], letters)
 [1] "a a" "b b" "a c" "b d" "a e" "b f" "a g" "b h" "a i" "b j" "a k" "b l"
[13] "a m" "b n" "a o" "b p" "a q" "b r" "a s" "b t" "a u" "b v" "a w" "b x"
[25] "a y" "b z"
> c(-1, 1) * 1:26
 [1]  -1   2  -3   4  -5   6  -7   8  -9  10 -11  12 -13  14 -15  16 -17  18 -19
[20]  20 -21  22 -23  24 -25  26

It has its pros (and cons as well), but I consider the stringi API stable now, and want to tinker with it as little as possible now (except when adding new functions).

gagolews commented 3 years ago

So I believe that stringr will be equipped with something similar to stopifnot(is.vector(pattern), length(pattern)==1) now? I am sure it is much more beginner-friendly.

hadley commented 3 years ago

Yeah, if you don't want to change this, I'll implement in stringr.

Just be aware that there aren't really consistent recycling rules in base R:

paste(1:3, 1:10)
#>  [1] "1 1"  "2 2"  "3 3"  "1 4"  "2 5"  "3 6"  "1 7"  "2 8"  "3 9"  "1 10"
1:3 * 1:10
#> Warning in 1:3 * 1:10: longer object length is not a multiple of shorter object
#> length
#>  [1]  1  4  9  4 10 18  7 16 27 10
data.frame(1:3, 1:10)
#> Error in data.frame(1:3, 1:10): arguments imply differing number of rows: 3, 10

Created on 2021-04-19 by the reprex package (v2.0.0)

gagolews commented 3 years ago

Yeah, I'd consider case (1) as bug (that should have been corrected long time ago), (2) as correct and consistent with the majority of other functions, (3) as, well, aftermath of the fact that a data frame is a list

Interestingly, cbind() has a (2)-like behaviour:

> cbind(1:3, 1:10)
      [,1] [,2]
 [1,]    1    1
 [2,]    2    2
 [3,]    3    3
 [4,]    1    4
 [5,]    2    5
 [6,]    3    6
 [7,]    1    7
 [8,]    2    8
 [9,]    3    9
[10,]    1   10
Warning message:
In cbind(1:3, 1:10) :
  number of rows of result is not a multiple of vector length (arg 1)
hadley commented 3 years ago

You also already follow the more sensible rule for recycling vectors of length 0:

paste0(character(), "-")
#> [1] "-"
stringi::stri_c(character(), "-")
#> character(0)

Created on 2021-04-25 by the reprex package (v2.0.0)