TidierOrg / TidierData.jl

Tidier data transformations in Julia, modeled after the dplyr/tidyr R packages.
MIT License
86 stars 7 forks source link

Adding `@relocate` #81

Closed drizk1 closed 9 months ago

drizk1 commented 9 months ago

It supports everything except renaming. It also supports last(names(df)) and first(names(df)) as last_col and first_col equivalents.

The issue was was in fact reproduced with best/unnest, and I have no idea why, considering it worked a few days ago with the unnest/nest PR, and why it works for the documenter and one set of docstrings but not "nightly" test.

drizk1 commented 9 months ago

Ok if I understand correctly, the 1.11 nightly run that is failing is testing the most uptodate version of julia. There is an error that seems to be stemming from subdataframes and flattening in certain instances with Cartesian indices.

Is this potentially a bug on the 1.11 end that gets fixed? Or do I try to fix it so it works for both 1.8 and the latest?

kdpsingh commented 9 months ago

It may be some behavior that changed in Julia 1.11.

drizk1 commented 9 months ago

Sounds good. I opened an issue on df.jl to double check if it's something I should work around, and if so I may have found a workaround.

drizk1 commented 9 months ago

Alright, it looks like this was a bug which they have now fixed.

issue fix

kdpsingh commented 9 months ago

Awesome, really nice work on following this up, and to @bkamins for the quick fix!

@drizk1, is this PR ready for review from your standpoint?

drizk1 commented 9 months ago

From my end, I think this should be good.

The one thing I forgot to add was documentation for it beyond the docstrings.

But I'm happy to do a docs focused PR for multiple things you think would be valuable, or to add it directly to this. Whatever your preference is works for me.

kdpsingh commented 9 months ago

For now, the docstrings are sufficient. Will be revamping the docs in the future and will ensure we capture this functionality.

kdpsingh commented 9 months ago

I'll review and get this merged by tomorrow. Sorry for the delay.

drizk1 commented 9 months ago

No worries at all! There's no rush. I figured you're busy w the new job/you'd be combining it with a bigger update overall.

One other random note, I was missing having things to implement, and put together a row_to_names macro like the janitor function (only other thing I could think of but I also don't know how frequently ppl need it).

If that's something you're interested in adding, I can add it here. If not, no worries, it was fun exercise.

kdpsingh commented 9 months ago

Love this PR. There are two things I'd like to fix before merging. Happy to look at it more closely.

Thing 1: I'd love to use before and after as arguments rather than before_column and after_column.

Thing 2: The way that argument matching is currently handled in @relocate() is different than other TidierData macros and a bit too flexible. For example, the documentation says that before_column and after_column are the optional arguments that specify where the column will be inserted.

It's surprising then that this actually works:

julia> @chain df @relocate(c, before = b)
10×3 DataFrame
 Row │ a      c      b     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1     21     11
   2 │     2     22     12
   3 │     3     23     13
   4 │     4     24     14
   5 │     5     25     15
   6 │     6     26     16
   7 │     7     27     17
   8 │     8     28     18
   9 │     9     29     19
  10 │    10     30     20

This also works but gives the wrong answer, because c ends up being inserted before a.

julia> @chain df @relocate(c, before_col = b)
10×3 DataFrame
 Row │ c      a      b     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │    21      1     11
   2 │    22      2     12
   3 │    23      3     13
   4 │    24      4     14
   5 │    25      5     15
   6 │    26      6     16
   7 │    27      7     17
   8 │    28      8     18
   9 │    29      9     19
  10 │    30     10     20

I love everything else about this function, including how it works with where(). Would just change the default argument names to before and after, and other provided arguments should produce an error.

drizk1 commented 9 months ago

Alright, so I adjusted the docstring to correctly represent the arguments. It was always before/after, but when I wrote the docstring I accidentally put in before_column/after_column (which were the fn arguments)

I also adjusted the function arguments to make it so that it is before, and after and the macro will now throw the following error if it receives other arguments. "Invalid keyword argument: only 'before' or 'after' are accepted."

For this, julia> @chain df @relocate(c, before_col = b) the incorrect order was because the macro was ignoring the before_col argument and treating it as julia> @chain df @relocate(c) # and moving that column to the front like the R equivalent, but now it will error if an incorrect kwarg is given

This should all now be fixed.