emer / etable

Data table structure in Go, now developed at https://github.com/cogentcore/core/tree/main/tensor
BSD 3-Clause "New" or "Revised" License
117 stars 7 forks source link
data dataframe-li go golang matrix pandas-dataframe tensor

etable: data table structure in Go

IMPORTANT UPDATE: Cogent Core now has an improved version of etable in its tensor package and associated sub-packages. This version will not be further maintained or developed. The v1 version is still needed for the v1 version of emergent.

Go Report Card Go Reference CI Codecov

etable (or eTable) provides a DataTable / DataFrame structure in Go (golang), similar to pandas and xarray in Python, and Apache Arrow Table, using etensor n-dimensional columns aligned by common outermost row dimension.

The e-name derives from the emergent neural network simulation framework, but e is also extra-dimensional, extended, electric, easy-to-use -- all good stuff.. :)

See examples/dataproc for a full demo of how to use this system for data analysis, paralleling the example in Python Data Science using pandas, to see directly how that translates into this framework.

See Wiki for how-to documentation, etc. and Cheat Sheet below for quick reference.

As a general convention, it is safest, clearest, and quite fast to access columns by name instead of index (there is a map that caches the column indexes), so the base access method names generally take a column name argument, and those that take a column index have an Index suffix. In addition, we use the Try suffix for versions that return an error message. It is a bit painful for the writer of these methods but very convenient for the users.

The following packages are included:

Cheat Sheet

et is the etable pointer variable for examples below:

Table Access

Scalar columns:

val := et.CellFloat("ColName", row)
str := et.CellString("ColName", row)

Tensor (higher-dimensional) columns:

tsr := et.CellTensor("ColName", row) // entire tensor at cell (a row-level SubSpace of column tensor)
val := et.CellTensorFloat1D("ColName", row, cellidx) // idx is 1D index into cell tensor

Set Table Value

et.SetCellFloat("ColName", row, val)
et.SetCellString("ColName", row, str)

Tensor (higher-dimensional) columns:

et.SetCellTensor("ColName", row, tsr) // set entire tensor at cell 
et.SetCellTensorFloat1D("ColName", row, cellidx, val) // idx is 1D index into cell tensor

Find Value(s) in Column

Returns all rows where value matches given value, in string form (any number will convert to a string)

rows := et.RowsByString("ColName", "value", etable.Contains, etable.IgnoreCase)

Other options are etable.Equals instead of Contains to search for an exact full string, and etable.UseCase if case should be used instead of ignored.

Index Views (Sort, Filter, etc)

The IndexView provides a list of row-wise indexes into a table, and Sorting, Filtering and Splitting all operate on this index view without changing the underlying table data, for maximum efficiency and flexibility.

ix := etable.NewIndexView(et) // new view with all rows

Sort

ix.SortColName("Name", etable.Ascending) // etable.Ascending or etable.Descending
SortedTable := ix.NewTable() // turn an IndexView back into a new Table organized in order of indexes

or:

nmcl := et.ColByName("Name") // nmcl is an etensor of the Name column, cached
ix.Sort(func(t *Table, i, j int) bool {
    return nmcl.StringValue1D(i) < nmcl.StringValue1D(j)
})

Filter

nmcl := et.ColByName("Name") // column we're filtering on
ix.Filter(func(t *Table, row int) bool {
    // filter return value is for what to *keep* (=true), not exclude
    // here we keep any row with a name that contains the string "in"
    return strings.Contains(nmcl.StringValue1D(row), "in")
})

Splits ("pivot tables" etc), Aggregation

Create a table of mean values of "Data" column grouped by unique entries in "Name" column, resulting table will be called "DataMean":

byNm := split.GroupBy(ix, []string{"Name"}) // column name(s) to group by
split.Agg(byNm, "Data", agg.AggMean) // 
gps := byNm.AggsToTable(etable.AddAggName) // etable.AddAggName or etable.ColNameOnly for naming cols

Describe (basic stats) all columns in a table:

ix := etable.NewIndexView(et) // new view with all rows
desc := agg.DescAll(ix) // summary stats of all columns
// get value at given column name (from original table), row "Mean"
mean := desc.CellFloat("ColNm", desc.RowsByString("Agg", "Mean", etable.Equals, etable.UseCase)[0])

Developer info

The visualization tools use the GoGi GUI and the struct fields use the desc tag for documentation. Use the modified goimports tool to auto-update standard comments based on these tags: https://cogentcore.org/core/docs/general/structfieldcomments/