Open xiaobaaaa opened 1 year ago
In DataFrames.jl for minimum you can just write:
transform!(groupby(df, :idcode), :union_year => minimum => :first_union)
However, indeed for other aggregations it is problematic. I should be resolved in Statistics.jl or StatsBase.jl, but I will keep it open here until we reach a decision what to do about it.
Just to make it clear, would you also expect minimum(Int[])
to return missing
or it should throw an error?
Ah - now I run your example. You need:
transform!(groupby(df, :idcode), :union_year => (x -> minimum(skipmissing(x), init=missing) => :first_union)
that is indeed a bit verbose (but I think better and more efficient than what you currently use).
Thank you very much for your response. The result I obtained when running the code you provided, transform!(groupby(df, :idcode), :union_year => (x -> minimum(skipmissing(x), init=missing)) => :first_union)
(The code you provided is missing half of a parenthesis, ")". Is my correction accurate?) was:
4×3 DataFrame
Row │ idcode union_year first_union
│ Int64 Int64? Missing
────┼─────────────────────────────────
1 │ 1 missing missing
2 │ 1 missing missing
3 │ 2 2001 missing
4 │ 2 2002 missing
but I was hoping to achieve the following result:
4×3 DataFrame
Row │ idcode union_year first_union
│ Int64 Int64? Missing
─────┼─────────────────────────────────
1 │ 1 missing missing
2 │ 1 missing missing
3 │ 2 2001 2001
4 │ 2 2002 2001
Ah - indeed. My code is incorrect (I was writing it from my head and I designed it wrongly).
I am already discussing with JuliaData/JuliaStats maintainers how to best resole your issue.
I recently encountered a situation where I needed to compute the minimum value of a variable grouped by another variable while ignoring missing values. In Stata, this can be achieved using the following simple code:
In Julia, using the DataFrames.jl package, the equivalent operation requires more verbose code:
I find this code to be more complicated and less readable than the equivalent Stata code. It would be great if DataFrames.jl could provide a simpler and more concise way to perform this operation.
Thank you for your consideration.