JuliaData / DataFramesMeta.jl

Metaprogramming tools for DataFrames
https://juliadata.github.io/DataFramesMeta.jl/stable/
Other
481 stars 55 forks source link

Request - grouped by columns available as single values rather than vectors #361

Open Lincoln-Hannah opened 1 year ago

Lincoln-Hannah commented 1 year ago

Would it be possible, within a @by block, to make the grouped by columns available as single values rather then vectors?

In the below, I'd like to create a column of myCurve structs, but because the :name column comes through as a vector, it only works for the myCurve_name_vec structs. I could convert it, it just wouldn't be so clean.

More generally, if you are grouping by a column, any related calculations would likely use that column as a single value.


@with_kw struct myCurve    
    name::Symbol
    curve::Vector{Int64}
end

@with_kw struct myCurve_name_vec
    name::Vector{Symbol}
    curve::Vector{Int64}
end

d = DataFrame( name=[:a,:a,:a,:b,:b,:b], curve =[1,2,3,11,12,13] )

@by d :name   :x = myCurve_with_vec( AsTable(:)... )    #works
@by d :name   :x = myCurve( AsTable(:)... )                   #doesn't work
bkamins commented 1 year ago

@Lincoln-Hannah - indeed I also often need it. I understand that this is request for DataFramesMeta.jl.

The only issue is mixing grouping and non-grouping columns. Maybe something like @val(:name) inside @by could be better instead (to distinguish taking :name as a column and @val(:name) as a value).

@val name is tentative.

What you currently can do is use first(:name) to get it, so maybe you would find it enough? (and just requiring documentation?)

pdeffebach commented 1 year ago

@Lincoln-Hannah Can I have more information on your use-case?

I also do this all the time, but first(:name) is enough for me.

Lincoln-Hannah commented 1 year ago

See related request: https://github.com/mauro3/Parameters.jl/issues/153

I'd like to move between DataFrames and arrays of structs as effortlessly as possible.

If I create a struct with fieldnames matching a database query. I'd like to convert the query into an array of structs in one line. Something like:

           @rtransform  df  :mystruct  =  mystruct(;  AsTable(:)...  ) 

A struct derived from a grouped DataFrame, will have single value fields for the group by columns and vector fields for the non-group-by columns.

pdeffebach commented 1 year ago

Okay so you would like

           @rtransform  df  :mystruct  =  mystruct(;  AsTable(:)...  ) 

to not return a DataFrame? Rather, you want it to return a Vector?

I still need more information on what you want. What is the output you desire? Give it as a Julia object, not a description.

Lincoln-Hannah commented 1 year ago

Sorry Peter. My bad. I was trying to isolate the key line. To get to a vector there would be an additional line.

@chain begin
     @rtransform  df  :mystruct  =  mystruct(;  AsTable(:)...  ) 
       _.mystruct
end

Actually, more often I'd put the result in a Dictionary. Example.

using Dictionaries 

@with_kw struct myStruct
    a::Int64
    b::Int64
    c::Vector{Int64}
    d::Vector{Int64}
end

dict_of_structs = @chain begin
    DataFrame( a=[1,1,2,2], b=[11,11,12,12],  c=1:4,  d=11:14 )

    @by [:a,:b]   :x  = myStruct(; AsTable(:)... )

     Dictionary(  _.a,    _.x    )
end

AsTable(:) produces a named tuple per row, except that group by columns are single numbers and other columns are vectors or sub arrays (as per usual).

[ (a=1,b=11,c=[1,2],d=[11,12]),
(a=2,b=12,c=[3,4],d=[13,14]) ]

each row becomes a myStruct. The last line creates a dictionary.

Dictionary 
1         |          myStruct(a=1,b=11,c=[1,2],d=[11,12])  
2         |          myStruct(a=2,b=12,c=[3,4],d=[13,14])    

We can then apply a function to any element

myFunc(   dict_of_strucst[1]  )

or broadcast over all

myFunc.(   dict_of_structs )