go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.98k stars 276 forks source link

Vector arithmetic #152

Closed danielpcox closed 3 years ago

danielpcox commented 3 years ago

Adds new Math method to dataframe.DataFrame capable of computing n-ary arithmetic functions against entire selected columns, storing the the result in a new column (or replacing an existing one). Supports int and float64 types. Supports operator specification by string (e.g., "+", "/", etc.) or unary, binary, or trinary int or float64 function (e.g., for supplying a float64 function from Go's math module). For example:

/*  `input` is a 5x4 DataFrame:

   Strings  Floats   Primes Naturals
0: e        2.718000 1      1
1: Pi       3.142000 3      2
2: Phi      1.618000 5      3
3: Sqrt2    1.414000 7      4
4: Ln2      0.693000 11     5
   <string> <float>  <int>  <int>
*/
df := New(
    series.New([]string{"e", "Pi", "Phi", "Sqrt2", "Ln2"}, series.String, "Strings"),
    series.New([]float64{2.718, 3.142, 1.618, 1.414, 0.693}, series.Float, "Floats"),
    series.New([]int{1, 3, 5, 7, 11}, series.Int, "Primes"),
    series.New([]int{1, 2, 3, 4, 5}, series.Int, "Naturals"),
)

// New method `Math` takes a new column name, an operator (string or func) and at least one column name
withNewDiffColumn = df.Math("Diff", "-", "Floats", "Primes")

fmt.Println(withNewDiffColumn)

/* New DataFrame now has a column named "Diff" which is
    the result of subtracting Primes from Floats.

    Strings  Floats   Primes Naturals Diff
 0: e        2.718000 1      1        1.718000  
 1: Pi       3.142000 3      2        0.142000  
 2: Phi      1.618000 5      3        -3.382000 
 3: Sqrt2    1.414000 7      4        -5.586000 
 4: Ln2      0.693000 11     5        -10.307000
    <string> <float>  <int>  <int>    <float> 
*/

See tests for further examples.

This PR also adds new FindElem method to dataframe.DataFrame which lets a user pull a particular series.Element out of a DataFrame by specifying a column and value to select a row (assumed to be unique), and another column to find a particular value within that row. For example, the following line will search through the "Metric" column of each row for a value "envoy_cluster_upstream_rq_active", and then it will return the series.Element from that row corresponding to the "Value" column:

df.FindElem("Metric", "envoy_cluster_upstream_rq_active", "Value")

This PR also introduces go modules support. Until merged into github.com/go-gota/gota, you will need to add a replace directive to any dependent code with the following:

go mod edit -replace github.com/go-gota/gota=github.com/greymatter-io/gota@vector-arithmetic
danielpcox commented 3 years ago

Blast. I just noticed that dev exists. I branched off of master. I'll close this PR in favor of https://github.com/go-gota/gota/pull/153 onto dev.