Closed tobydriscoll closed 4 months ago
This is easily fixable. I'll explain why it's happening shortly. It's a straightforward explanation. There is a workaround as well.
The reason this behavior happens occasionally in TidierData.jl is that the package tries to infer whether a function should be vectorized (i.e., run separately on each element of a vector) or not (i.e., run on the entire vector).
Since most functions and operators do require vectorization, TidierData defaults to vectorizing functions and operators unless it knows not to. The way it knows which ones not to vectorize is using a look-up table. This is called "auto-vectorization" and is part of the magic (for good and bad) of TidierData.
mean()
happens to be part of the look-up table whereas mad()
is not.
In a future update, we will add mad()
to that list. For now, the workaround is to add a tilde prefix, which marks the function for TidierData as one not to vectorize:
using TidierData, Statistics, StatsBase
df = DataFrame((; year=repeat(1982:1984, inner=4), val=rand(12)))
@chain df begin
@group_by(year)
@summarize(median=median(val), mad=~mad(val))
@ungroup
end
Or you can also add it in your session to the do-not-vectorize list.
More details on this behavior and how to do this are located in the documentation page here: https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/autovec/
That makes sense. Thanks and KUTGW! I'll leave the issue open since you intend to make a change.
Thanks! Yes, I'll close the issue after adding mad()
to the do-not-vectorize list.
This is fixed in #107.
Tried to use @summarize with median/mad like with mean/std, and it failed.
For example,
Brilliant! But:
Sadness. There should be no problem using
mad
, AFAICT.