JuliaData / DataFramesMeta.jl

Metaprogramming tools for DataFrames
https://juliadata.github.io/DataFramesMeta.jl/stable/
Other
481 stars 55 forks source link

`@by` is not working #162

Closed xiaodaigh closed 4 years ago

xiaodaigh commented 4 years ago

This simple MWE is throwing error . I have DataFrames 0.21.7 and DataFramesMeta 0.5.1

using DataFrames
df = DataFrame(grp = rand(1:8, 100), val = rand(100))

using DataFramesMeta, Statistics
@by(df, :grp, mean(:val))
ERROR: ArgumentError: 'Float64' iterates 'Float64' values, which doesn't satisfy the Tables.jl 
`AbstractRow` interface
Stacktrace:
 [1] invalidtable(::Float64, ::Float64) at C:\Users\RTX2080\.julia\packages\Tables\Eti9i\src\tofromdatavalues.jl:42
 [2] iterate at C:\Users\RTX2080\.julia\packages\Tables\Eti9i\src\tofromdatavalues.jl:48 [inlined]
 [3] buildcolumns at C:\Users\RTX2080\.julia\packages\Tables\Eti9i\src\fallbacks.jl:185 [inlined]
 [4] columns at C:\Users\RTX2080\.julia\packages\Tables\Eti9i\src\fallbacks.jl:237 [inlined]   
 [5] DataFrame(::Float64; copycols::Bool) at C:\Users\RTX2080\.julia\packages\DataFrames\cdZCk\src\other\tables.jl:43
 [6] DataFrame at C:\Users\RTX2080\.julia\packages\DataFrames\cdZCk\src\other\tables.jl:34 [inlined]
 [7] (::var"##293#29")(::SubArray{Float64,1,Array{Float64,1},Tuple{Array{Int64,1}},false}) at C:\Users\RTX2080\.julia\packages\DataFramesMeta\c67UK\src\DataFramesMeta.jl:71
 [8] (::var"#27#28")(::SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}) at C:\Users\RTX2080\.julia\packages\DataFramesMeta\c67UK\src\DataFramesMeta.jl:73
 [9] _combine(::var"#27#28", ::GroupedDataFrame{DataFrame}, ::Nothing, ::Bool, ::Bool) at C:\Users\RTX2080\.julia\packages\DataFrames\cdZCk\src\groupeddataframe\splitapplycombine.jl:1248    
 [10] combine_helper(::Function, ::GroupedDataFrame{DataFrame}, ::Nothing; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool) at C:\Users\RTX2080\.julia\packages\DataFrames\cdZCk\src\groupeddataframe\splitapplycombine.jl:589
 [11] #combine#375 at C:\Users\RTX2080\.julia\packages\DataFrames\cdZCk\src\groupeddataframe\splitapplycombine.jl:442 [inlined]
 [12] combine(::Function, ::GroupedDataFrame{DataFrame}) at C:\Users\RTX2080\.julia\packages\DataFrames\cdZCk\src\groupeddataframe\splitapplycombine.jl:442
 [13] top-level scope at REPL[17]:1
 [14] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088
pdeffebach commented 4 years ago

The problem is that @by currently needs an expression of the form y = fun(:x). Your above example doesn't meet that.

In the future, we will hopefully be able to make this work by transforming mean(:val) to :val => mean and then putting it into DataFrames.combine(groupby(df, :grp), :val => mean). Initial work has started towards that goal in #163.

Unfortunately the error here is very difficult to understand because it's hard to reason exactly what expression DataFramesMeta is making at the end of the day. This will become easier to reason about after #163. It looks like @by actually does call DataFrames.combine currently, constructs an intermediate NamedTuple and calls DataFrame on that. Your expression creates a Float64 value for each group, which DataFrames.combine doesn't know what to do with.

Rest assured this will get fixed in the future.

pdeffebach commented 4 years ago

Fixed in #163

bkamins commented 4 years ago

closing - right?