TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
515 stars 14 forks source link

`@select` can accept negation `!` #42

Closed zhezhaozz closed 1 year ago

zhezhaozz commented 1 year ago

In R's dplyr, the ! operator negates a selection. This feature is not supported in Tidier.jl. I wish to add this feature to @select macro.

Solution

In parse_tidy, add a condition to check for !(args__) pattern and pass args to a parse_negation parsing function. parse_negation handles args and returns a negation expression.

Examples

@select now has the ! feature:

df = DataFrame(a = repeat('a':'e'), b = 1:5, c = 11:15);

@chain df begin
@select(!(a:b))
end

@chain df begin
@select(!(1:2))
end

@chain df begin
@select(!a)
end

@chain df begin
@select(!1)
end

@chain df begin
@select(!contains("b"))
end
zhezhaozz commented 1 year ago

I realize that parse_tidy should be left unaffected, so I defined a parse_select function to handle the expression passed to @select.

kdpsingh commented 1 year ago

This looks awesome! Let me look through and will follow up with questions.

I want to make sure this doesn't mess up regular negation inside of functions, which I don't think we have built into our doctests. I also want to make sure the behavior is consistent with how negative selection currently works. I think there may be edge case differences and just want to understand what they are.

kdpsingh commented 1 year ago

Nice work on this. Having reviewed the code, I think there are potential problems with only handling this inside of @select() and not other macros. For example, I don't think this will work inside of across().

My sense is that the easiest and most consistent thing to do is to implement this directly within parse_tidy(). Let's discuss and work on this together when we meet.

zhezhaozz commented 1 year ago

@kdpsingh parse_select is removed and now parse_tidy handles the negation functions (both - and !). This should also work within across, for example:

@chain df begin
@mutate(across(!(a:b), mean))
end

@chain df begin
@mutate(across(!contains("a"), mean))
end
zhezhaozz commented 1 year ago

Turn this PR into a draft PR since code to implement ! caused unexpected behaviors and failed build.

kdpsingh commented 1 year ago

This looks great @zzhaozheUM! Will take a look on my computer when I get a chance, but on my phone this looks great.

Only thing I need to check is how negative selection works with selection helpers like starts_with(), and whether negative selection works inside across. May add some doctests to check this if not already added yet. Thank you!

kdpsingh commented 1 year ago

I see that your example addresses my question. Let me review and will plan to merge. Thank you @zzhaozheUM!

zhezhaozz commented 1 year ago

I see that your example addresses my question. Let me review and will plan to merge. Thank you @zzhaozheUM!

I'm not sure if the -() works inside across. I will look at and work on this.