go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
2.98k stars 276 forks source link

panic: runtime error: invalid memory address or nil pointer dereference #167

Closed cryptowww closed 2 years ago

cryptowww commented 2 years ago

when I apply aggregation for a dataframe ,it happen a panic error like this:

panic: runtime error: invalid memory address or nil pointer dereference

the full error stack as follow:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x20 pc=0x38c064]

goroutine 1 [running]:
github.com/go-gota/gota/series.Series.Len(...)
        D:/ProgramFiles/goplus/pkg/mod/github.com/go-gota/gota@v0.11.0/series/series.go:560
github.com/go-gota/gota/dataframe.Groups.Aggregation(0xc005d41b30, 0xc000270f00, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        D:/ProgramFiles/goplus/pkg/mod/github.com/go-gota/gota@v0.11.0/dataframe/dataframe.go:504 +0x904

and my code like this :

agg := gdf.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT}, []string{"countn"})

how can i solve it ? thank you.

cryptowww commented 2 years ago

BTW:my data is Chinese, I don'nt know whether the gota support it.

chrmang commented 2 years ago

Hi @jameschuh, which versions of gota and Go do you use?

cryptowww commented 2 years ago

Thank you for your reply.

go version go1.16.6 windows/amd64 gota : v0.12.0

my code :

gdf := dfs.GroupBy("key1","key2")
agg := gdf.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT}, []string{"countn"})

I read the api and find the return value of GroupBy() is *Group and the Aggregation is called by Group, then I modify the code as

gdf := dfs.GroupBy("key1","key2")
gdfv := *gdf
agg := gdfv.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT}, []string{"countn"})

but I get the same error.

cryptowww commented 2 years ago

when the problem happened at the first time, I use v0.11, then I update the version to v0.12 , also the same error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x20 pc=0x109c144] 
goroutine 1 [running]: 
github.com/go-gota/gota/series.Series.Len(...) 
D:/ProgramFiles/goplus/pkg/mod/github.com/go-gota/gota@v0.12.0/series/series.go:562
github.com/go-gota/gota/dataframe.Groups.Aggregation(0xc0045b7530, 0xc0003cd300, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)                                                               D:/ProgramFiles/goplus/pkg/mod/github.com/go-gota/gota@v0.12.0/dataframe/dataframe.go:497 +0x904 
chrmang commented 2 years ago

I'm almost sure that's a bug in gota. But I don't know how to reproduce it. Can you provide a minimum code snippet to reproduce? Including some test data?

Maybe it's easier to modify a test to show the error. TestDataFrame_GroupBy TestDataFrame_Aggregation

Open a pull-request or post your code snippet here - just as you like.

cryptowww commented 2 years ago

My test code as follows:


package main

import (

    "fmt"
    "github.com/go-gota/gota/dataframe"
    //"github.com/go-gota/gota/series"

)

func main(){
    df := dataframe.LoadRecords([][]string{
        {"机构","姓名","性别"},
        {"北京","张三","男"},
        {"上海","李四","女"},
        {"天津","王老五","男"},
    })

    fmt.Println(df)

    dfg := df.GroupBy("机构", "性别")

    agg := dfg.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT},[]string{"countn"})
    fmt.Println(agg)
}

and the shell output :

% go run main.go

[3x3] DataFrame

    机构       姓名       性别
 0: 北京       张三       男
 1: 上海       李四       女
 2: 天津       王老五      男
    <string> <string> <string>

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x10be3c4]

goroutine 1 [running]:
github.com/go-gota/gota/series.Series.Len(...)
    /Users/james/go/pkg/mod/github.com/go-gota/gota@v0.12.0/series/series.go:562
github.com/go-gota/gota/dataframe.Groups.Aggregation(0xc00010ca20, 0xc00017a000, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /Users/james/go/pkg/mod/github.com/go-gota/gota@v0.12.0/dataframe/dataframe.go:497 +0x904
main.main()
    /Users/james/gospace/main.go:24 +0x52b
exit status 2

this code is run on my macbook while the issue happened on my work computer at office. but the issue is the same

thank you @chrmang

chrmang commented 2 years ago

Ok, I can reproduce the bug. Thank you for providing the example code. I'll dig into the code to find the root cause.

BTW Go uses UTF-8 in strings, so Chinese letters should not cause an issue.

chrmang commented 2 years ago

Hi @jameschuh , in your code agg := dfg.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT},[]string{"countn"}), you are aggregating the column 'countn'. This column doesn't exist and causing the panic (which is the bug, it should cause an error). Maybe have a look at TestDataFrame_Aggregation to see the usage of the column names.

cryptowww commented 2 years ago

sorry for my carelessness.

I modify the code

agg := gdf.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT}, []string{"机构","性别"})

and another error happened

DataFrame error: Aggregation: len(typs) != len(colanmes)

then I see the sourcecode of dataframe, and it require the size of []AggregationType and the size of colanmes must keep the same. I think I can aggregation on two dimensions and get the count/sum or other result.

According to the error tips, I change my code:

agg := gdf.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT,dataframe.Aggregation_COUNT}, []string{"分公司","合同类型"})

It return the duplicate value, like this:

机构      机构_COUNT 性别     性别_COUNT
0: 山东   6.000000  男     6.000000
1: 四川   5.000000  男     5.000000
2: 山东   2.000000  女     2.000000
3: 河北   19.000000 女     19.000000
....

but I just want :

机构       性别     COUNT
0: 山东     男     6.000000
1: 四川     男     5.000000
2: 山东     女     2.000000
3: 河北    女     19.000000
....
chrmang commented 2 years ago

I wrote your example in Latin letters. It's difficult for me to see differences in Chinese letters.

package main

import (
    "fmt"
    "github.com/go-gota/gota/dataframe"
)

func main() {
    df := dataframe.LoadRecords([][]string{
        {"AA", "BB", "CC"},
        {"AB", "BB", "X"},
        {"AC", "BC", "Y"},
        {"AD", "BD", "X"},
    })
    dfg := df.GroupBy("AA", "CC")
    agg2 := dfg.Aggregation([]AggregationType{Aggregation_COUNT}, []string{"AA"})
    fmt.Println(agg2)
}

the result is:

[3x3] DataFrame

    AA       AA_COUNT CC
 0: AD       1.000000 X
 1: AB       1.000000 X
 2: AC       1.000000 Y
    <string> <float>  <string>

Is this what you expect?

I copied this example also to Go Playground. Maybe you need to run the code more than one time to get the dependencies downloaded.

cryptowww commented 2 years ago

yes , It's just what I want.

Thank you @chrmang

Maybe I haven't understand how to use it . but I'm applying it combined with excelize to process data in excel , thank you for your great work!!👍🏻

chrmang commented 2 years ago

You are welcome.

I'm closing this issue because the bug is fixe and no other questions are open.

If new questions are coming up, feel free to open a new issue or ask on slack (see README )