apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.29k stars 3.47k forks source link

[R] Add binding for not_between() ternary kernel #30723

Closed asfimport closed 2 years ago

asfimport commented 2 years ago

Add R binding for not_between() compute function from ARROW-15223.

Reporter: Eduardo Ponce / @edponce Assignee: Dragoș Moldovan-Grünfeld / @dragosmg

Related issues:

Note: This issue was originally created as ARROW-15224. Please see the migration documentation for further details.

asfimport commented 2 years ago

Dragoș Moldovan-Grünfeld / @dragosmg: Given dplyr::not_between() does not exist, do we need an R not_between() binding? What do you think? @jonkeane @thisisnic @paleolimbot

asfimport commented 2 years ago

Eduardo Ponce / @edponce: For C++ (ARROW-15223) the not_between function was decided not to be included because it is equivalent to applying a logical NOT to the result of the BETWEEN function which is a simple composition to do. Also, not_between is not a common function in other DB/dataframe APIs.

asfimport commented 2 years ago

Dragoș Moldovan-Grünfeld / @dragosmg: I think the situation is similar in dplyr - the data manipulation R package we link to.


library(dplyr, warn.conflicts = FALSE)

starwars %>% 
  filter(between(height, 100, 150))
#> # A tibble: 5 × 14
#>   name      height  mass hair_color skin_color eye_color birth_year sex   gender
#>   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
#> 1 Leia Org…    150    49 brown      light      brown             19 fema… femin…
#> 2 Mon Moth…    150    NA auburn     fair       blue              48 fema… femin…
#> 3 Watto        137    NA black      blue, grey yellow            NA male  mascu…
#> 4 Sebulba      112    40 none       grey, red  orange            NA male  mascu…
#> 5 Gasgano      122    NA none       white, bl… black             NA male  mascu…
#> # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

starwars %>% 
  filter(!between(height, 100, 150))
#> # A tibble: 76 × 14
#>    name     height  mass hair_color skin_color eye_color birth_year sex   gender
#>    <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
#>  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
#>  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
#>  3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
#>  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
#>  5 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
#>  6 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
#>  7 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
#>  8 Biggs D…    183    84 black      light      brown           24   male  mascu…
#>  9 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
#> 10 Anakin …    188    84 blond      fair       blue            41.9 male  mascu…
#> # … with 66 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
asfimport commented 2 years ago

Eduardo Ponce / @edponce: Based on these observations, it seems we conclude that a not_between function will not be included. So we can close this issue.

asfimport commented 2 years ago

Dragoș Moldovan-Grünfeld / @dragosmg: I will close the issue with won't fix. Thanks

asfimport commented 2 years ago

Dragoș Moldovan-Grünfeld / @dragosmg: A corresponding dplyr::not_between() function does not exist.