apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.55k stars 3.54k forks source link

[R] New/improved vignette on dplyr features #27960

Open asfimport opened 3 years ago

asfimport commented 3 years ago

Reporter: Neal Richardson / @nealrichardson

Related issues:

Note: This issue was originally created as ARROW-12137. Please see the migration documentation for further details.

asfimport commented 3 years ago

Neal Richardson / @nealrichardson: Here's an example from the taxi dataset that shows a messy string column and three kinds of queries: base R function, stringr, and an arrow_ prefixed compute function that doesn't have an R mapping (afaik)


ds %>%
  select(payment_type) %>%
  mutate(
    allcaps = toupper(payment_type),
    chars = str_length(allcaps),
    is_upper = arrow_ascii_is_upper(payment_type)
  ) %>%
  head(10) %>%
  collect()
asfimport commented 3 years ago

Ian Cook / @ianmcook: We need to fix the type error in payment_type == 3 that Jon describes here: https://issues.apache.org/jira/browse/ARROW-12114?focusedCommentId=17314989&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17314989

asfimport commented 3 years ago

Dragoș Moldovan-Grünfeld / @dragosmg: I thought it might be a good idea to add some information regarding case_when() only working inside a mutate() context and not standalone. See ARROW-14362.

asfimport commented 3 years ago

Nicola Crane / @thisisnic: We should also include something on the NSE funcs which do have any differences, see https://github.com/apache/arrow/pull/11575#pullrequestreview-796850588 for an example

 

asfimport commented 2 years ago

Will Jones / @wjones127: We may also wish to mention Acero by name as the backend for dplyr queries. We'll likely be discussing dplyr in the upcoming Acero post https://issues.apache.org/jira/browse/ARROW-16632