JuliaData / DataTables.jl

(DEPRECATED) A rewrite of DataFrames.jl based on Nullable
Other
29 stars 11 forks source link

Join on columns of different name #78

Closed prcastro closed 6 years ago

prcastro commented 7 years ago

The docs say we have to join only on columns that have the same name on both DTs. What about using

join(a, b, on=(:a_name, :b_name))
join(a, b, on=[(:a_name1, :b_name1), (:a_name2, :b_name2)])

for joining on columns of different names? This is because I think that is common to reuse a and b after the merge, so it's not nice to have the obligation to rename in case of a merge.

davidanthoff commented 7 years ago

I've also run into this issue, so +1 to this idea.

You could also use Query.jl to do this right now, its various join commands allow you to join on columns with different names (or rather, it joins on whatever the expressions to the left and right of the equals keyword return, i.e. you can have arbitrary complicated transformations there as long as they are valid julia code). No promise about performance, though, I haven't spent much time on tuning the join commands (this mainly means I just don't know how they perform).

nalimilan commented 7 years ago

Makes sense, though we currently concentrate on stabilizing the basics, so without help it won't probably be implemented very soon.

cjprybol commented 6 years ago

transferred to DataFrames, see https://github.com/JuliaData/DataFrames.jl/issues/1297

cjprybol commented 6 years ago

@prcastro This functionality was just merged into DataFrames.jl with https://github.com/JuliaData/DataFrames.jl/pull/1312. It should work on master and will be present in the next release. Thanks for posting the issue!