go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.98k stars 276 forks source link

DataFrame ToMatrix function #101

Closed vitasiku closed 4 years ago

vitasiku commented 4 years ago

Could you, please, implement the dataframe toMatrix (mat.Matrix) function which is hinted in the readme?

I am new to golang and am trying to replicate a python pipeline(as part of transitioning to golang) which uses StandardScaler but I get an exception that the mat.Matrix as indicated in the error here.

Note that I am using golang "github.com/pa-m/sklearn/preprocessing" package

# command-line-arguments ./go_csv.go:18:10: m.columns undefined (type matrix has no field or method columns) ./go_csv.go:21:21: undefined: mat64 ./go_csv.go:22:9: undefined: mat64 ./go_csv.go:123:12: cannot use selDf1 (type dataframe.DataFrame) as type mat.Matrix in argument to scaler.Fit: dataframe.DataFrame does not implement mat.Matrix (missing At method) ./go_csv.go:125:27: cannot use selDf1 (type dataframe.DataFrame) as type mat.Matrix in argument to scaler.Transform: dataframe.DataFrame does not implement mat.Matrix (missing At method)

kniren commented 4 years ago

Hi, as described in the README, you can just add something like this to your code (Just realized that there is a typo, should be mat, not mat64):

type matrix struct {
    DataFrame
}

func (m matrix) At(i, j int) float64 {
    return m.columns[j].Elem(i).Float()
}

func (m matrix) T() mat.Matrix {
    return mat.Transpose{m}
}

Now, if you want to use a dataframe.DataFrame as a matrix in other libraries, you would need to transform it into this new type:

yourMatrix := matrix{yourDataFrame}
funcThatTakesAMatrixTypeObject(yourMatrix)

I'm typing this from the top of my head, so let me know if this works for you

vitasiku commented 4 years ago

Thanks for your response, @kniren

After following the instruction above, I get the following error caused by the At function. m.columns undefined (type matrix has no field or method columns)

...
f, err := os.Open("./go_read_csv_test.csv")
if err != nil {
    panic(err)
}
defer f.Close()

var r io.Reader
r = f

df := dataframe.ReadCSV(r)
dfMat := matrix{df.Subset([]int{0})}
...
kniren commented 4 years ago

It's a bit difficult to tell from just the code you have provided. Is the error happening on the line dfMat := matrix{...}?

This works for me with the code from my first post (go version go1.13.4 linux/amd64):

type matrix struct {
    DataFrame
}

func (m matrix) At(i, j int) float64 {
    return m.columns[j].Elem(i).Float()
}

func (m matrix) T() mat.Matrix {
    return mat.Transpose{m}
}

func TestMatrixStuff() {
    df := LoadRecords(
        [][]string{
            {"A", "B", "C", "D"},
            {"4", "1", "1", "0"},
            {"3", "2", "2", "0.5"},
        },
    )
    dfMat := matrix{df.Subset([]int{0})}
    fmt.Println(df.String())
    fmt.Println(dfMat.String())
    fmt.Printf("\nFirst index: %f", dfMat.At(0, 0))
}

Does the error happen after passing the matrix struct to a different function?

vitasiku commented 4 years ago

Maybe it's a platform/version issue: (go version go1.12.7 darwin/amd64).

Even with your snippet above, I still get ./go_mat.go:15:10: m.columns undefined (type matrix has no field or method columns)

package main

import (
    "fmt"

    "github.com/kniren/gota/dataframe"
    "gonum.org/v1/gonum/mat"
)

type matrix struct {
    dataframe.DataFrame
}

func (m matrix) At(i, j int) float64 {
    return m.columns[j].Elem(i).Float()
}

func (m matrix) T() mat.Matrix {
    return mat.Transpose{m}
}

func TestMatrixStuff() {
    df := dataframe.LoadRecords(
        [][]string{
            {"A", "B", "C", "D"},
            {"4", "1", "1", "0"},
            {"3", "2", "2", "0.5"},
        },
    )
    dfMat := matrix{df.Subset([]int{0})}
    fmt.Println(df.String())
    fmt.Println(dfMat.String())
    fmt.Printf("\nFirst index: %f", dfMat.At(0, 0))
}

func main() {
    TestMatrixStuff()
}

Will try updating my golang and revert.

vitasiku commented 4 years ago

Still getting the same error after updating to (go version go1.13.4 darwin/amd64)

./go_mat.go:15:10: m.columns undefined (type matrix has no field or method columns)

kniren commented 4 years ago

Ok I found the issue. The problem is that the columns field is not exported. Since I was running that thing from the dataframe_test.go file, it was working for me.

Use the following code instead for the matrix set-up:

type matrix struct {
    dataframe.DataFrame
}

func (m matrix) At(i, j int) float64 {
    return m.Elem(i, j).Float()
}

func (m matrix) T() mat.Matrix {
    return mat.Transpose{m}
}

Additionally, you might want to switch your import of gota to:

"github.com/go-gota/gota/dataframe"