JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.73k stars 367 forks source link

Error for name generation for multiple anonymous functions #2492

Open pdeffebach opened 3 years ago

pdeffebach commented 3 years ago
julia> df = DataFrame(A = [1, 2, 3, missing], B = [2, 1, 2, 1]);

julia> select(df, :A => (t -> t .+ 1), :A => (t -> t .+ 2))
ERROR: ArgumentError: duplicate target column name A_function passed

Should be :A_fun and :A_fun1, right? Or something similar?

bkamins commented 3 years ago

I personally do not think it should happen by default as it typically leads to error prone code.

If you have select(df, :A => (t -> t .+ 1), :A => (t -> t .+ 2)) you should probably explicitly provide column names (this is the preferred style in DataFrames.jl).

We already have discussed it (I do not remember in which issue) and decided that we can in the future add makeunique kwarg to select,... but it was decided to be left for later.

What is your use case where you think that automatic makeunique=true makes sense?

pdeffebach commented 3 years ago

Ah right. Sorry for bringing it up agian.

The scenario is something like

@where(df, :a > 1, :a < 4)

This creates new anonymous functions each time without a destination name.

I will add an option that gensym()s a new name in this situation.

bkamins commented 3 years ago

no problem - we now have a separate issue tracking this point :).

Yes - in where it has to be handled in a special way (I was aware of this and just planned to add a suffix => :xi (where i is condition index) to every transform).

In @select and @transform it should not be a problem - right?

pdeffebach commented 3 years ago

No it should not be a problem. This only shows up for @orderby and @where.