go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
3.04k stars 281 forks source link

colnames get lost after calling Rapply() #42

Closed dtynn closed 6 years ago

dtynn commented 6 years ago

Hi alex, thank you for your great work.

I noticed that the column names get lost after calling Rapply during my tests, also the detected types.

test codes:

package main

import (
    "log"

    "github.com/kniren/gota/dataframe"
    "github.com/kniren/gota/series"
)

func main() {
    df := dataframe.LoadRecords(
        [][]string{
            []string{"A", "B", "C", "D"},
            []string{"a", "4", "5.1", "true"},
            []string{"k", "5", "7.0", "true"},
            []string{"k", "4", "6.0", "true"},
            []string{"a", "2", "7.1", "false"},
        },
    )

    applied := df.Rapply(func(s series.Series) series.Series {
        return s
    })

    log.Println(df)
    log.Println(applied)
}

output:

2017/11/01 17:38:32 [4x4] DataFrame

    A        B     C        D
 0: a        4     5.100000 true
 1: k        5     7.000000 true
 2: k        4     6.000000 true
 3: a        2     7.100000 false
    <string> <int> <float>  <bool>

2017/11/01 17:38:32 [4x4] DataFrame

    X0       X1       X2       X3
 0: a        4        5.100000 true
 1: k        5        7.000000 true
 2: k        4        6.000000 true
 3: a        2        7.100000 false
    <string> <string> <string> <string>
dtynn commented 6 years ago

Hi alex I read the issues and find this:

We want to be able to apply functions to both rows and columns over a DataFrame. The dimension of the returned Series should be compatible with each other. Additionally, when applying functions over rows, since we can't expect the columns to be all of the same type, we will have to cast the types.

so maybe the output is as expected ? if so, please close the issue~

kniren commented 6 years ago

Yeah, when using Rapply you cannot expect the aggregate function to rename the functions for you. The type casting is a necessity as well and working as intended.

If your aggregate function intends to return the same number of rows and you want to keep the column name you should rename the dataframe accordingly.

Thank you for the comment! If you disagree with the current behaviour feel free to continue the discussion here, for the time being I'm closing this issue.

Best, Alex