JuliaData / TableOperations.jl

Common table operations on Tables.jl interface implementations
Other
47 stars 9 forks source link

TableOperations

Common table operations on Tables.jl compatible sources

CI codecov deps version version

Installation: at the Julia REPL, import Pkg; Pkg.add("TableOperations")

Maintenance: TableOperations is maintained collectively by the JuliaData collaborators. Responsiveness to pull requests and issues can vary, depending on the availability of key collaborators.

Documentation

TableOperations.select

The TableOperations.select function allows specifying a custom subset and order of columns from a Tables.jl source, like:

ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])

table_subset = ctable |> TableOperations.select(:C, :A) |> Tables.columntable

This "selects" the C and A columns from the original table, and re-orders them with C first. The column names can be provided as Strings, Symbols, or Integers.

TableOperations.transform

The TableOperations.transform function allows specifying a "transform" function per column that will be applied per element. This is handy when a simple transformation is needed for a specific column (or columns). Note that this doesn't allow the creation of new columns, but only applies the transform function to the specified column, and thus, replacing the original column. Usage is like:

ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])

table = ctable |> TableOperations.transform(C=x->Symbol(x)) |> Tables.columntable

Here, we're providing the transform function x->Symbol(x), which turns an argument into a Symbol, and saying we should apply it to the C column. Multiple tranfrom functions can be provided for multiple columns and the column to transform function can also be provided in Dicts that map column names as Strings, Symbols, or even Ints (referring to the column index).

TableOperations.filter

The TableOperations.filter function allows applying a "filter" function to each row in the input table source, keeping rows for which f(row) is true. Usage is like:

ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])

table = ctable |> TableOperations.filter(x->Tables.getcolumn(x, :B) > 2.0) |> Tables.columntable

TableOperations.map

The TableOperations.map function allows applying a "mapping" function to each row in the input table source; the function f should take and return a Tables.jl Row compatible object. Usage is like:

ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])

table = ctable |> TableOperations.map(x->(A=Tables.getcolumn(x, :A), C=Tables.getcolumn(x, :C), B=Tables.getcolumn(x, :B) * 2)) |> Tables.columntable

TableOperations.narrowtypes

The TableOperations.narrowtypes function allows infering column element types to better fit the stored data. Usage is like:

ctable_type_any = (A=Any[1, missing, 3], B=Any[1.0, 2.0, 3.0], C=Any["hey", "there", "sailor"])

table = TableOperations.narrowtypes(ctable_type_any) |> Tables.columntable

TableOperations.dropmissing

The TableOperations.dropmissing function allows to lazily remove every row where missing values are present. Usage is like:

ctable = (A=[1, missing, 3], B=[1.0, 2.0, 3.0], C=["hey", "there", "sailor"])

table = ctable |> TableOperations.dropmissing |> Tables.columntable

TableOperations.joinpartitions

The TableOperations.joinpartitions function allows you to lazily chain (or "join") multiple tables into a single long table. Usage is like:

ctables = Tables.partitioner(i -> (A=fill(i, 10), B=rand(10) * i), 1:3)

table = ctables |> TableOperations.joinpartitions |> Tables.columntable

Contributing and Questions

Contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or would just like to ask a question.