Closed cryptowww closed 2 years ago
BTW:my data is Chinese, I don'nt know whether the gota support it.
Hi @jameschuh, which versions of gota and Go do you use?
Thank you for your reply.
go version go1.16.6 windows/amd64 gota : v0.12.0
my code :
gdf := dfs.GroupBy("key1","key2")
agg := gdf.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT}, []string{"countn"})
I read the api and find the return value of GroupBy() is *Group and the Aggregation is called by Group, then I modify the code as
gdf := dfs.GroupBy("key1","key2")
gdfv := *gdf
agg := gdfv.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT}, []string{"countn"})
but I get the same error.
when the problem happened at the first time, I use v0.11, then I update the version to v0.12 , also the same error:
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x20 pc=0x109c144]
goroutine 1 [running]:
github.com/go-gota/gota/series.Series.Len(...)
D:/ProgramFiles/goplus/pkg/mod/github.com/go-gota/gota@v0.12.0/series/series.go:562
github.com/go-gota/gota/dataframe.Groups.Aggregation(0xc0045b7530, 0xc0003cd300, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) D:/ProgramFiles/goplus/pkg/mod/github.com/go-gota/gota@v0.12.0/dataframe/dataframe.go:497 +0x904
I'm almost sure that's a bug in gota. But I don't know how to reproduce it. Can you provide a minimum code snippet to reproduce? Including some test data?
Maybe it's easier to modify a test to show the error. TestDataFrame_GroupBy TestDataFrame_Aggregation
Open a pull-request or post your code snippet here - just as you like.
My test code as follows:
package main
import (
"fmt"
"github.com/go-gota/gota/dataframe"
//"github.com/go-gota/gota/series"
)
func main(){
df := dataframe.LoadRecords([][]string{
{"机构","姓名","性别"},
{"北京","张三","男"},
{"上海","李四","女"},
{"天津","王老五","男"},
})
fmt.Println(df)
dfg := df.GroupBy("机构", "性别")
agg := dfg.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT},[]string{"countn"})
fmt.Println(agg)
}
and the shell output :
% go run main.go
[3x3] DataFrame
机构 姓名 性别
0: 北京 张三 男
1: 上海 李四 女
2: 天津 王老五 男
<string> <string> <string>
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x10be3c4]
goroutine 1 [running]:
github.com/go-gota/gota/series.Series.Len(...)
/Users/james/go/pkg/mod/github.com/go-gota/gota@v0.12.0/series/series.go:562
github.com/go-gota/gota/dataframe.Groups.Aggregation(0xc00010ca20, 0xc00017a000, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/Users/james/go/pkg/mod/github.com/go-gota/gota@v0.12.0/dataframe/dataframe.go:497 +0x904
main.main()
/Users/james/gospace/main.go:24 +0x52b
exit status 2
this code is run on my macbook while the issue happened on my work computer at office. but the issue is the same
thank you @chrmang
Ok, I can reproduce the bug. Thank you for providing the example code. I'll dig into the code to find the root cause.
BTW Go uses UTF-8 in strings, so Chinese letters should not cause an issue.
Hi @jameschuh ,
in your code agg := dfg.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT},[]string{"countn"})
, you are aggregating the column 'countn'. This column doesn't exist and causing the panic (which is the bug, it should cause an error).
Maybe have a look at TestDataFrame_Aggregation to see the usage of the column names.
sorry for my carelessness.
I modify the code
agg := gdf.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT}, []string{"机构","性别"})
and another error happened
DataFrame error: Aggregation: len(typs) != len(colanmes)
then I see the sourcecode of dataframe, and it require the size of []AggregationType and the size of colanmes must keep the same. I think I can aggregation on two dimensions and get the count/sum or other result.
According to the error tips, I change my code:
agg := gdf.Aggregation([]dataframe.AggregationType{dataframe.Aggregation_COUNT,dataframe.Aggregation_COUNT}, []string{"分公司","合同类型"})
It return the duplicate value, like this:
机构 机构_COUNT 性别 性别_COUNT
0: 山东 6.000000 男 6.000000
1: 四川 5.000000 男 5.000000
2: 山东 2.000000 女 2.000000
3: 河北 19.000000 女 19.000000
....
but I just want :
机构 性别 COUNT
0: 山东 男 6.000000
1: 四川 男 5.000000
2: 山东 女 2.000000
3: 河北 女 19.000000
....
I wrote your example in Latin letters. It's difficult for me to see differences in Chinese letters.
package main
import (
"fmt"
"github.com/go-gota/gota/dataframe"
)
func main() {
df := dataframe.LoadRecords([][]string{
{"AA", "BB", "CC"},
{"AB", "BB", "X"},
{"AC", "BC", "Y"},
{"AD", "BD", "X"},
})
dfg := df.GroupBy("AA", "CC")
agg2 := dfg.Aggregation([]AggregationType{Aggregation_COUNT}, []string{"AA"})
fmt.Println(agg2)
}
the result is:
[3x3] DataFrame
AA AA_COUNT CC
0: AD 1.000000 X
1: AB 1.000000 X
2: AC 1.000000 Y
<string> <float> <string>
Is this what you expect?
I copied this example also to Go Playground. Maybe you need to run the code more than one time to get the dependencies downloaded.
yes , It's just what I want.
Thank you @chrmang
Maybe I haven't understand how to use it . but I'm applying it combined with excelize to process data in excel , thank you for your great work!!👍🏻
when I apply aggregation for a dataframe ,it happen a panic error like this:
the full error stack as follow:
and my code like this :
how can i solve it ? thank you.