Open schlichtanders opened 3 months ago
This is the intended way to do it:
julia> combine(df, [:a, :b] .=> myextrema .=> x -> x .* ["_min", "_max"])
1×4 DataFrame
Row │ a_min a_max b_min b_max
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 10 4 13
You can then even do just e.g.:
julia> combine(df, [:a, :b] .=> Ref∘extrema .=> x -> x .* ["_min", "_max"])
1×4 DataFrame
Row │ a_min a_max b_min b_max
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 10 4 13
Thank you very much - I couldn't find such an example in the documentation.
I still don't understand why your second version works :sweat_smile:.
This approach has the disadvantage that one needs to replicate which fields the transformation function has. Looks flexible, and easy to understand, which is really great, but also like duplication.
AsTable
or a vector of column names.This approach has the disadvantage that one needs to replicate which fields the transformation function has.
Yes - this is a disadvantage. That is why I have commented that you do not have to pass these column names in the function (the example with Ref
, which skips defining target column names).
We could allow for a function taking both "source column names" and "names returned by a function" and allowing combining them, but it seemed overly complex (i.e. the API would be hard for typical users to understand and learn). What I have given you was the most concise variant.
The variant that you want is available, and it avoids duplication, but the disadvantage is that the code is longer (so I thought that it is less interesting):
julia> using DataFrames
julia> df = DataFrame(a = 1:10, b = 4:13)
10×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 5
3 │ 3 6
4 │ 4 7
5 │ 5 8
6 │ 6 9
7 │ 7 10
8 │ 8 11
9 │ 9 12
10 │ 10 13
julia> function myextrema(a)
ex = extrema(a[1])
n = propertynames(a)[1]
(; Symbol(n, "_min") => ex[1], Symbol(n, "_max") => ex[2])
end
myextrema (generic function with 1 method)
julia>
julia> combine(df, AsTable.([:a, :b]) .=> myextrema .=> AsTable)
1×4 DataFrame
Row │ a_min a_max b_min b_max
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 10 4 13
2. It is documented that you can auto-generate the target column names using a function (to dynamically generate them). In this case the function takes source column names as input.
Could an example be added to https://dataframes.juliadata.org/stable/man/working_with_dataframes/? This was my source of truth and there I couldn't find it.
There is an example in the docstring. https://dataframes.juliadata.org/stable/lib/functions/#DataFrames.combine. We could add also something in the intro manual. Could you propose something that you would find most useful?
I think just below .=>
within the combine
Section would be nice
julia> combine(df, names(df) .=> sum, names(df) .=> prod)
1×4 DataFrame
Row │ A_sum B_sum A_prod B_prod
│ Int64 Float64 Int64 Float64
─────┼─────────────────────────────────
1 │ 10 10.0 24 24.0
# this is new:
julia> combine(df, names(df) .=> Ref ∘ extrema .=> (c -> c .* ["_min", "_max"]))
Probably with a little extra explanation what the Ref
is doing here (I haven't entirely understood its need yet).
See #3433 for an update of the manual. Of course please comment if something is not clear or should be improved.
looks especially good. Thank you for the detailed documentation improvement!
I am looking for a fix or workaround for how to use
AsTable
in combination with several columns which should be transformed, i.e..=>
.I always get
ERROR: ArgumentError: Duplicate column name(s) returned:
throws the following error
My ideal behaviour would be that AsTable prepends the column name, but of course this would be breaking. Maybe there could be a
PrependColName(AsTable)
wrapper or something similar?