Closed zhezhaozz closed 1 year ago
I came across that same blog post a few weeks ago!
A couple things we need to consider:
Should case_when() work outside of Tidier.jl or should it be implemented as a pseudo-function?
While that blog post is good for a single case, we need to make it easy to vectorize case_when (because I believe the ternary operators require map, list comprehensions, or loops for vectorization).
Should case_when() work outside of Tidier.jl or should it be implemented as a pseudo-function?
I think case_when
should work outside Tidier.jl because we can take the advantage that @mutate
can already handle the user-defined functions and auto-vectorization for a single column
While that blog post is good for a single case, we need to make it easy to vectorize case_when (because I believe the ternary operators require map, list comprehensions, or loops for vectorization).
Yes, I believe we need vectorize ternary operators.
Resolved by PR #41.
It seems like ternary functions are still unsupported. For example,
tbl = DataFrame(x = ["alpha", "beta", "charlie", "delta", "echo"])
helper(x) = contains(x, r"a.*a") ? getproperty(match(r"a.*a", x), :match) : "_"
@chain tbl begin
@mutate(
count_a = helper(x),
count_b = case_when(
contains(x, r"a.*a") => getproperty(match(r"a.*a", x), :match),
true => "_"
)
)
end
Same thing happens with ifelse
or if_else
since it does not short-circuit leading to errors being thrown.
I think I know what's happening here and will take a look in the near future.
Should I open a new issue to track the progress?
Sorry let me re-open this issue. I’m on vacation this week but will look at this next week.
Great! Thanks.
The short answer is that if you want to use case_when()
or if_else()
, both the condition and the return values all have to be valid when vectorized. This isn't necessarily a problem with case_when()
or if_else()
since R similarly produces an error if one of the underlying conditions produces an error.
In this case, getproperty(match(r"a.*a", x), :match)
isn't valid when x
is nothing
, which results in an error inside of if_else()
or case_when()
. There may be some value to creating a keyword argument or helper function that replaces errors with missing
values (essentially a vectorized try()
/catch()
), but it's not something we are ready to work on yet.
Ternary operators work fine in Tidier but in order to vectorize them, you have to place them inside an array comprehension (or wrap them in a function). In the future, we may consider wrapping ternary operators inside of an array comprehension automatically, but this gets very tricky to implement correctly/safely.
Both of these examples work okay:
tbl = DataFrame(x = ["alpha", "beta", "charlie", "delta", "echo"])
helper(x) = contains(x, r"a.*a") ? getproperty(match(r"a.*a", x), :match) : "_"
@chain tbl begin
@mutate(count_a = helper(x))
end
@chain tbl begin
@mutate(count_a = [contains(x, r"a.*a") ? getproperty(match(r"a.*a", x), :match) : "_" for x in x])
end
I'm going to re-close the issue but feel free to reply if you have thoughts. Appreciate your using the package.
Checking this blog, Julia can reproduce
case_when
by using⋅ ? ⋅ : ⋅
ternary operator. For example, in R code:In Julia code:
Possible solution
parse_case_when
function to transform R's formula into Julia's ternary operator.case_when
that takes a expression and internally callsparse_case_when
function.case_when
should enable auto-vectorization and can be handled byparse_autovec
parsing function when being called within@mutate
.