Closed kyle-hamlin closed 6 years ago
I agree that this is a very interesting feature to have and it is on the roadmap.
Unfortunately I have not had so much time lately to implement big improvements on Gota, if you come up with a good solution we can discuss it here or via PR. Otherwise I will get to it when I have some more free time available.
For the meantime, you have functions that should allow you to apply functions to rows and columns via RApply and CApply and you could split the groups yourself and then join them back together.
Make sure to check and comment on issue #13 for future developments regarding GroupBy, etc.
Best, Alex
I've actually never written any go but I'm an avid pandas user. I was thinking this could be a good project for me to get my feet wet. If you have any starting design ideas or pointers/recommendations that could help guide me I would love to hear them. I will try to go over your code and think about how to implement this GroupBy functionality, and try to share my thoughts here as I work.
Awesome, I would love to get this implemented for sure. For a start, check issue #13, where I talk about this concept.
Go is a wonderful and sensible language, best of luck getting into it!
Essentially GroupBy should be creating an internal index for the groups of rows that work together, and then we could move further expanding existing functions to accommodate this groups (So for example, sorting or function application is done on a per group basis.
I encourage you to start contributing small, since that also makes my life much easier when reviewing the code, so for a start, just with the index creation of groups as a PR.
In order to contribute to the project, make sure to work on the dev
branch and submit the PRs there. All the code for major features should have at least a sensible amount of unit testing using Go's testing capabilities. Furthermore the tests go test
, the linter golint
and go vet
should not throw any errors, which will also force the preferred documentation best practices for exported functions. Also gofmt
is mandatory, so you should probably just run it automatically after saving.
I urge you to comment on issue #13 instead of this one, which I closed to avoid duplicate issues.
Thanks for the interest and let me know your thoughts!
A fundamental feature of dataframes is grouping by column/s and summarizing (mean, median, max, min, etc..) other column/s, are you thinking about implementing this functionality?