Crunch-io / crplyr

A 'dplyr' Interface for Crunch
https://crunch.io/r/crplyr/
GNU Lesser General Public License v3.0
5 stars 3 forks source link

Add collect method #1

Closed gshotwell closed 7 years ago

gshotwell commented 7 years ago

New method to collect a CrunchDataset from the server. One gap which has been identified is that this method won't respect dataset order. Currently when a vector is collected from the server with as.vector(v[3:1]) the API will return v[1:3] instead. This seems like an edge case for crplyr because people are more like to reorder a dplyr object using arrange than they are to pass a row index to the initial dataset.

codecov[bot] commented 7 years ago

Codecov Report

Merging #1 into master will increase coverage by 2.32%. The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #1      +/-   ##
========================================
+ Coverage   97.67%   100%   +2.32%     
========================================
  Files           6      7       +1     
  Lines          43     68      +25     
========================================
+ Hits           42     68      +26     
+ Misses          1      0       -1
Impacted Files Coverage Δ
R/group-by.R 100% <100%> (+8.33%) :arrow_up:
R/collect.R 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cacf7b9...8a0eb53. Read the comment docs.

nealrichardson commented 7 years ago

LGTM. Did you say that there was something you forgot to add?

Aside from that, please

  1. Bump the patch version in DESCRIPTION (to 0.1.5)
  2. Add yourself as contributor to Authors@R in DESCRIPTION
  3. Add a new ### crplyr 0.1.5 (under development) section to NEWS.md and add a bullet for this

and then merge.

gshotwell commented 7 years ago

Great, it'll need another look after the grouping stuff is added. Basically if someone does the following:

ds %>% 
    group_by(cyl) %>%
    collect()

They should probably get a grouped tibble rather than an ungrouped one. This is important if we want to do something like automatically apply unresolved functions after the data is collected, and also so that something like this produces a group-wise summary:

 ds %>% 
    group_by(cyl) %>%
    collect() %>%
    summarize(n = n())
gshotwell commented 7 years ago

@nealrichardson This should be ready for review. Changes are to better accommodate array variables.

nealrichardson commented 7 years ago

👍 Once https://github.com/Crunch-io/rcrunch/pull/110 goes we should simplify collect to use as.data.frame and keep the implementation of that in crunch.