go-gota / gota

Gota: DataFrames and data wrangling in Go (Golang)
Other
3.04k stars 281 forks source link

Added Split() methods for both pkgs with tests included #24

Closed shixzie closed 7 years ago

shixzie commented 7 years ago
shixzie commented 7 years ago

the Split() methods takes a percent (value between 0 and 1), calling Series.Split(0.5) would split the series in two equal parts (50%) the Series on which the func was called (let's call it S) will contain the first 50% and the returned Series S' would have the other 50%. by calling Series.Split(0.3) S would have the first 30% and S' the other 70%

shixzie commented 7 years ago

I want to be a contributor :)

shixzie commented 7 years ago

Thanks @svanharmelen for your review and opinion :+1:

kniren commented 7 years ago

Thank you very much for you contribution @Shixzie, and thank you too @svanharmelen for your review. Before getting into the actual review, I have to say that there are way too many things happening on this PR which make it very difficult to review:

For future contributions I would ask you to keep each major change as a separate PR. Ideally you would branch dev for every feature and then you will target that specific branch to be merged back into dev. This is a good read: http://nvie.com/posts/a-successful-git-branching-model/

Also before performing any PR, make sure you have your code formatted with gofmt, you have test coverage for at least the happy path, pass golint and go vet linters, and check that your code follows the style of the existing codebase.

Now I'm gonna put here my thoughts about each of the changes.

Refactoring of errors

Although I feel that a better error trace would be interesting, as referenced in #7, I don't think that the way it was implemented here was ideal. First of all, I feel that the creation of a helper function for createErr feels a bit out of place, being that all it does is calling errors.Wrap. Before merging something like this, I feel it's better if we would discuss how this is to be done in the #7 thread.

Adding extra nil checks on some instances

In some cases this might be useful, like on the DataFrame.Set method, but I saw a number of times that you check both len(x) == 0 and x == nil. This is not necessary as @svanharmelen pointed out, since len(x) will already return 0 on a nil slice.

Adding a new time.Time type

This is great, I also have been working on this, so I guess it's my bad for not marking the issue as assigned to me, but wasn't expecting any contributions. In any case, I'm going to share what I had so far, you can put it together with your code and we can review it on a separate PR: https://gist.github.com/kniren/f961e32a811b305b5442bae303f5d8b9

Regarding the time parsing when converting from string, this is a tricky matter, but I feel we should settle for a given format for automatic conversion (I favour const timeformat string = time.RFC3339) and encourage users to either use that or pass an existing time.Time.

Adding the DataFrame.Split() and Series.Split() methods

This is a good idea, although I'm not sure it's the right implementation for the Split method. Essentially for this use case (Splitting by percentage), you can just create a []int index with randomized elements and then make the splitting from there. When talking about splitting on #13, the expected behaviour is to split a DataFrame to N DataFrames by a number of factors. This is consistent with what a user from pandas or R would expect. In any case, I still don't know which should be the best behaviour for #13, so I encourage you to discuss the pros and cons on that issue thread before jumping to code. I do see the need and the benefit of having a percentage split on Gota but I'm just not sure this is the API that I would like to expose, furthermore we can continue the discussion on this matter.

With all that said, I would like to thank you once again for your contribution. I'm going to close this issue, but I encourage you to continue the discussion and to submit individual PRs for the appropriate topics after considering my feedback.