JuliaData / TypedTables.jl

Simple, fast, column-based storage for data analysis in Julia
Other
145 stars 25 forks source link

Concatenating two tables horizontally #69

Open fredcallaway opened 3 years ago

fredcallaway commented 3 years ago

in DataFrames, we can hcat two dataframes with the same number of rows (such that the new dataframe has all columns from both sources). This is very intuitive to me, and I was surprised to find that it doesn't work with TypedTables, e.g.

hcat(Table(x=[1], y=[2]), Table(z=[3]))

throws ERROR: ArgumentError: Named tuple names do not match.

It looks like the meaning of hcat in this package is very different from what I expected. What is the best way to accomplish "column-wise" horizontal concatenation?

fredcallaway commented 3 years ago

One possible solution here is to define:

Base.merge(tables::Table...) = map(merge, tables...)

andyferris commented 3 years ago

Hi @fredcallaway

Firstly, I think you should be able to concatenate the columns of two tables via Table(table1, table2). You can add some additional columns to a single table via Table(table; newcol = [...], ...). Does that still work?

Secondly a Table is an AbstractArray so hcat is already defined. You would be able to create a n x 2 matrix of rows from two length n tables (which are typically, but not always, vectors). In this case the tables are expected to have the same schema (column names and types) which is why you saw the "Named tuple names do not match" error.

andyferris commented 3 years ago

And also - there is already an overload for map(merge, table1, table2), so that should work out-of-the-box too. :)

https://github.com/JuliaData/TypedTables.jl/blob/main/src/columnops.jl#L8-L10

Note: that particular code creates a copy of the data, since map usually creates unaliased containers.

fredcallaway commented 3 years ago

Ah, that's convenient! Is this documented anywhere? I looked in the section on joining tables and didn't find it. I think it would also be helpful to add a note to the docstring of hcat since I think other people might try that first, as I did.

By the way, map(merge, table1, table2, table3) returns an Array (not a Table). Might make sense to define a variadic version of that function you linked?

andyferris commented 3 years ago

Unfortunately the documentation is too sparse - it seems this isn't documented yet. Good suggestions!