SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

The order of the vectors when joining two data frames. #508

Open kojix2 opened 5 years ago

kojix2 commented 5 years ago

The join method is frequently used in data frame operations. Joining two data frames is easy in Daru. But the order of the vectors is a bit strange.

Ruby Daru

require 'daru'
people = Daru::DataFrame.new(ID: [20, 40],  Name: ["John Doe", "Jane Doe"])
jobs = Daru::DataFrame.new(ID: [20, 40], Job: ["Lawyer", "Doctor"])

Daru::Core::Merge.join(people, jobs, on: [:ID], how: :inner)
# people.join(jobs, how: :inner, on: [:ID])

The vector of Name and the vector of Job are separated to the left and right of the ID.

Julia DataFrame.jl

https://juliadata.github.io/DataFrames.jl/stable/man/joins.html

using DataFrames
people = DataFrame(ID = [20, 40], Name = ["John Doe", "Jane Doe"])
jobs = DataFrame(ID = [20, 40], Job = ["Lawyer", "Doctor"])

join(people, jobs, on = :ID)

The ID is at the left end. Even if you join multiple data frames, the position of the ID column does not change.

I think Julia's way is more practical.