Summary

This PR adds relevant tests for the @slice() macro based on the tests which are in the dplyr code, found here. I skipped quite a few tests that involve functionality that I don't think is implemented yet. I also didn't go crazy and add all the possible tests to keep the PR at a reasonable size for review.

I went ahead and put these under the tests/ folder rather than implementing as doctests. My thinking on this is that there are two types of things that I can foresee wanting to test:

basic examples that show how someone could use the functions (the doctests).
tests focused on on edge-cases or cases that test the robustness and correctness of the implementation.
- sometimes this can even be silly stuff that doesn't necessarily make sense to include in the docs, like "this code throws an error if someone tries to use a string as a slice index...right?"

This keeps the documentation examples concise and helpful while also thoroughly testing the underlying functionality in various scenarios. I think most if not all of these tests fall under the second case.

Overview of the Implemented Tests

The test summary currently looks like this:

Test Summary:                                           | Pass  Error  Total  Time
TidierData                                              |   20      4     24  8.1s
  pivots                                                |   12            12  3.0s
    pivot_wider                                         |    3             3  0.9s
    pivot_longer                                        |    9             9  2.1s
  @slice()                                              |    8      4     12  3.7s
    empty slice drops all rows                          |    2             2  0.6s
    slicing DataFrame yields DataFrame                  |    1             1  0.1s
    slice keeps positive indices, ignoring out of range |    1      1      2  2.3s
    slice keeps negative indices, ignoring out of range |    1      1      2  0.4s
    slice errors if positive and negative indices mixed |    1             1  0.1s
    slice errors if index is not numeric                |    2             2  0.1s
    slice keeps zero length groups                      |           1      1  0.1s
    slice retains labels for zero length groups         |           1      1  0.0s

I'll drop some descriptions of the failing tests below for discussion😄

Positive Indices Test

Here's the R code and output from the test I'm trying to replicate:

gf <- group_by(tibble(g = c(1, 2, 2, 3, 3, 3), id = 1:6), g)
out <- slice(gf, 2)
out

#> # A tibble: 2 × 2
#> # Groups:   g [2]
#>       g    id
#>   <dbl> <int>
#> 1     2     3
#> 2     3     5

Here's what happens in Julia:

gf = @group_by(DataFrame(g = [1, 2, 2, 3, 3, 3], id = 1:6), g)
out = @slice(gf, 2)

ERROR: BoundsError: attempt to access 1-element Vector{Int64} at index [[2]]
Stacktrace:
 [1] _combine(gd::GroupedDataFrame{…}, cs_norm::Vector{…}, optional_transform::Vector{…}, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:755
 [2] _combine_prepare_norm(gd::GroupedDataFrame{…}, cs_vec::Vector{…}, keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:87
 [3] _combine_prepare(gd::GroupedDataFrame{…}, ::Base.RefValue{…}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:52
 [4] _combine_prepare
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:26 [inlined]
 [5] combine
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:857 [inlined]
 [6] #combine#775
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:846 [inlined]
 [7] macro expansion
   @ ~/Documents/github/TidierData.jl/src/slice.jl:19 [inlined]
 [8] top-level scope
   @ REPL[22]:1

caused by: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:352 [inlined]
 [2] _combine(gd::GroupedDataFrame{…}, cs_norm::Vector{…}, optional_transform::Vector{…}, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:752
 [3] _combine_prepare_norm(gd::GroupedDataFrame{…}, cs_vec::Vector{…}, keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:87
 [4] _combine_prepare(gd::GroupedDataFrame{…}, ::Base.RefValue{…}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:52
 [5] _combine_prepare
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:26 [inlined]
 [6] combine
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:857 [inlined]
 [7] #combine#775
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:846 [inlined]
 [8] macro expansion
   @ ~/Documents/github/TidierData.jl/src/slice.jl:19 [inlined]
 [9] top-level scope
   @ REPL[22]:1

    nested task error: BoundsError: attempt to access 1-element Vector{Int64} at index [[2]]
    Stacktrace:
     [1] throw_boundserror(A::Vector{Int64}, I::Tuple{Vector{Int64}})
       @ Base ./abstractarray.jl:737
     [2] checkbounds
       @ ./abstractarray.jl:702 [inlined]
     [3] _getindex
       @ ./multidimensional.jl:888 [inlined]
     [4] getindex
       @ ./abstractarray.jl:1291 [inlined]
     [5] getindex
       @ ~/.julia/packages/DataFrames/58MUJ/src/subdataframe/subdataframe.jl:181 [inlined]
     [6] (::var"#51#53")(sdf::SubDataFrame{DataFrame, DataFrames.Index, Vector{Int64}})
       @ Main ~/Documents/github/TidierData.jl/src/slice.jl:20
     [7] _combine_process_callable(wcs_i::Base.RefValue{…}, optional_i::Bool, parentdf::DataFrame, gd::GroupedDataFrame{…}, seen_cols::Dict{…}, trans_res::Vector{…}, idx_agg::Base.RefValue{…}, threads::Bool)
       @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:381
     [8] macro expansion
       @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:729 [inlined]
     [9] (::DataFrames.var"#759#768"{…})()
       @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/other/utils.jl:227
Some type information was truncated. Use `show(err)` to see complete types.

Negative Indices Test

Same for the negative indices, here's the R code:

gf <- group_by(tibble(g = c(1, 2, 2, 3, 3, 3), id = 1:6), g)
out <- slice(gf, -(1:2))
out

#> # A tibble: 1 × 2
#> # Groups:   g [1]
#>       g    id
#>   <dbl> <int>
#> 1     3     6

Here's the Julia code and resulting error (it's very similar to the one above)

gf = @group_by(DataFrame(g = [1, 2, 2, 3, 3, 3], id = 1:6), g)
out = @slice(gf, -(1:2))

ERROR: BoundsError: attempt to access 1-element Vector{Int64} at index [Not([1, 2])]
Stacktrace:
 [1] _combine(gd::GroupedDataFrame{…}, cs_norm::Vector{…}, optional_transform::Vector{…}, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:755
 [2] _combine_prepare_norm(gd::GroupedDataFrame{…}, cs_vec::Vector{…}, keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:87
 [3] _combine_prepare(gd::GroupedDataFrame{…}, ::Base.RefValue{…}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:52
 [4] _combine_prepare
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:26 [inlined]
 [5] combine
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:857 [inlined]
 [6] #combine#775
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:846 [inlined]
 [7] top-level scope
   @ ~/Documents/github/TidierData.jl/src/slice.jl:23

caused by: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:352 [inlined]
 [2] _combine(gd::GroupedDataFrame{…}, cs_norm::Vector{…}, optional_transform::Vector{…}, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:752
 [3] _combine_prepare_norm(gd::GroupedDataFrame{…}, cs_vec::Vector{…}, keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:87
 [4] _combine_prepare(gd::GroupedDataFrame{…}, ::Base.RefValue{…}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:52
 [5] _combine_prepare
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:26 [inlined]
 [6] combine
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:857 [inlined]
 [7] #combine#775
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:846 [inlined]
 [8] top-level scope
   @ ~/Documents/github/TidierData.jl/src/slice.jl:23

    nested task error: BoundsError: attempt to access 1-element Vector{Int64} at index [Not([1, 2])]
    Stacktrace:
     [1] throw_boundserror(A::Vector{Int64}, I::Tuple{InvertedIndices.InvertedIndexIterator{Int64, Vector{…}, Base.OneTo{…}}})
       @ Base ./abstractarray.jl:737
     [2] checkbounds
       @ ./abstractarray.jl:702 [inlined]
     [3] _getindex
       @ ./multidimensional.jl:888 [inlined]
     [4] getindex
       @ ./abstractarray.jl:1291 [inlined]
     [5] getindex
       @ ~/.julia/packages/DataFrames/58MUJ/src/subdataframe/subdataframe.jl:181 [inlined]
     [6] (::var"#56#58")(sdf::SubDataFrame{DataFrame, DataFrames.Index, Vector{Int64}})
       @ Main ~/Documents/github/TidierData.jl/src/slice.jl:24
     [7] _combine_process_callable(wcs_i::Base.RefValue{…}, optional_i::Bool, parentdf::DataFrame, gd::GroupedDataFrame{…}, seen_cols::Dict{…}, trans_res::Vector{…}, idx_agg::Base.RefValue{…}, threads::Bool)
       @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:381
     [8] macro expansion
       @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:729 [inlined]
     [9] (::DataFrames.var"#759#768"{…})()
       @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/other/utils.jl:227
Some type information was truncated. Use `show(err)` to see complete types.

I suspect that these are failing because GroupedDataFrames don't behave quite the same as a grouped tibble does in R as far as things like indexing go, so I'm curious if this is a place where we'd expect the Tidier equivalent of dplyr code to behave exactly the same.

Zero-Length Groups

This is an interesting set of tests. Basically the gist is that in dplyr, if one of the grouping variables is a factor with defined levels, slice() should still retain this group when working with a grouped data frame. This is mostly for consistency/expected results when working with grouped data in R.

There isn't a group_size function in Julia, so I tried to get the size of each group for comparison with combine and nrow. The zero-length group doesn't seem to be accounted for, so this test fails as the size vector is of length 2 rather than one of length 3 that looks like [1, 1, 0] as it does in the R test.

That being said - this may not be a slice() issue, as the GroupedDataFrame itself seems to only have 2 groups and doesn't appear to account for the zero-length group even though it seems to recognize that there is one.

julia> df
GroupedDataFrame with 2 groups based on keys: e, f, g
First Group (2 rows): e = 1, f = CategoricalValue{Int64, UInt32} 1 (1/3), g = 1
 Row │ e      f     g      x     
     │ Int64  Cat…  Int64  Int64 
─────┼───────────────────────────
   1 │     1  1         1      1
   2 │     1  1         1      2
⋮
Last Group (2 rows): e = 1, f = CategoricalValue{Int64, UInt32} 2 (2/3), g = 2
 Row │ e      f     g      x     
     │ Int64  Cat…  Int64  Int64 
─────┼───────────────────────────
   1 │     1  2         2      1
   2 │     1  2         2      4

So this may be a case where we really don't expect the same behavior because DataFrames doesn't behave the same as R--and if so we could probably just drop these two testsets.

TidierOrg / TidierData.jl

Add slice tests! #111

Summary

Overview of the Implemented Tests

Positive Indices Test

Negative Indices Test

Zero-Length Groups