Open Lincoln-Hannah opened 1 year ago
@Lincoln-Hannah - indeed I also often need it. I understand that this is request for DataFramesMeta.jl.
The only issue is mixing grouping and non-grouping columns. Maybe something like @val(:name)
inside @by
could be better instead (to distinguish taking :name
as a column and @val(:name)
as a value).
@val
name is tentative.
What you currently can do is use first(:name)
to get it, so maybe you would find it enough? (and just requiring documentation?)
@Lincoln-Hannah Can I have more information on your use-case?
I also do this all the time, but first(:name)
is enough for me.
See related request: https://github.com/mauro3/Parameters.jl/issues/153
I'd like to move between DataFrames and arrays of structs as effortlessly as possible.
If I create a struct
with fieldnames matching a database query. I'd like to convert the query into an array of structs in one line. Something like:
@rtransform df :mystruct = mystruct(; AsTable(:)... )
A struct derived from a grouped DataFrame, will have single value fields for the group by columns and vector fields for the non-group-by columns.
Okay so you would like
@rtransform df :mystruct = mystruct(; AsTable(:)... )
to not return a DataFrame
? Rather, you want it to return a Vector
?
I still need more information on what you want. What is the output you desire? Give it as a Julia object, not a description.
Sorry Peter. My bad. I was trying to isolate the key line. To get to a vector there would be an additional line.
@chain begin
@rtransform df :mystruct = mystruct(; AsTable(:)... )
_.mystruct
end
Actually, more often I'd put the result in a Dictionary. Example.
using Dictionaries
@with_kw struct myStruct
a::Int64
b::Int64
c::Vector{Int64}
d::Vector{Int64}
end
dict_of_structs = @chain begin
DataFrame( a=[1,1,2,2], b=[11,11,12,12], c=1:4, d=11:14 )
@by [:a,:b] :x = myStruct(; AsTable(:)... )
Dictionary( _.a, _.x )
end
AsTable(:)
produces a named tuple per row, except that group by columns are single numbers and other columns are vectors or sub arrays (as per usual).
[ (a=1,b=11,c=[1,2],d=[11,12]),
(a=2,b=12,c=[3,4],d=[13,14]) ]
each row becomes a myStruct. The last line creates a dictionary.
Dictionary
1 | myStruct(a=1,b=11,c=[1,2],d=[11,12])
2 | myStruct(a=2,b=12,c=[3,4],d=[13,14])
We can then apply a function to any element
myFunc( dict_of_strucst[1] )
or broadcast over all
myFunc.( dict_of_structs )
Would it be possible, within a @by block, to make the grouped by columns available as single values rather then vectors?
In the below, I'd like to create a column of
myCurve
structs, but because the:name
column comes through as a vector, it only works for themyCurve_name_vec
structs. I could convert it, it just wouldn't be so clean.More generally, if you are grouping by a column, any related calculations would likely use that column as a single value.