SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.04k stars 139 forks source link

Add new functions to Vector: max, min, index_of_max, index_of_min, max_by, min_by, index_of_max_by, index_of_min_by #303

Closed athityakumar closed 7 years ago

athityakumar commented 7 years ago

This Pull Request aims to add functions into Daru::Vector and Daru::DataFrame classes.

Do test the implemented functions. And suggest more functions to add.

lokeshh commented 7 years ago

You can add support for blocks to index_of_max and index_of_min.

For example

[1] pry(main)> dv = Daru::Vector.new ['a', 'bb', 'ccc'], index: [:a, :b, :c]
=> #<Daru::Vector(3)>
   a   a
   b  bb
   c ccc
[2] pry(main)> dv.index_of_max { |i| i.size }
# This should return :c
v0dro commented 7 years ago

Write tests for all new functions.

athityakumar commented 7 years ago

@v0dro : Sure, I will include the tests for these functions first in a day or two, before adding other new functions - and let you know for another review of this PR.

athityakumar commented 7 years ago

@v0dro @lokeshh : Thanks for your suggestions - the PR is ready for another review. :smile:

Updates

(1) Functions index_of_min , index_of_max ,max , min are all placed inside blocks - hence, enabling features like

dv = Daru::Vector.new ['a', 'bb', 'ccc'], index: [:a, :b, :c]
=> #<Daru::Vector(3)>
   a   a
   b  bb
   c ccc
dv.index_of_max { |i| i.size }
=> :c

(2) Tests have been added to all the 5 functions.

Help required

After writing the max function, the tests of the following function failed - due to incompatibility between Daru::Vector#max and Dataframe that expects a vector output..

rspec './spec/maths/statistics/dataframe_spec.rb > context "max"

I've temporarily commented that test. But how do we proceed with this? Also, are there any other functions that need to be added?

Also, build is failing due to this reason : File (/home/travis/build/SciRuby/daru/lib/daru/maths/statistics/vector.rb) is only (93.29%) covered. This is below the expected minimum coverage per file of (95.00%).. Any tips on how to fix this?

athityakumar commented 7 years ago

@zverok : Any other changes to be done / functions to be added? If not, can I start working on the from_html Nokogiri module? :smile:

athityakumar commented 7 years ago

@zverok : Thanks for mentioning about the YARD resource. This is the first time I came across YARD, and I must say that it is just simplistically awesome. :smile:

Update

(1) Added better & shorter tests, with one test per it. (2) Added documentation.

Edit

The commit after this comment fixes a minor documentation issue. PR is ready for review now. :smile:

Any other changes to be made?

athityakumar commented 7 years ago

@zverok @v0dro : I've pushed the changes as suggested above. :smile:

athityakumar commented 7 years ago

I still have one more commit to push (reverting the changes that I earlier did in spec/vector_spec.rb) - however, the Travis CI build will anyway fail unless PR #306 is merged. Hence, I'll wait before pushing this commit. :smile:

athityakumar commented 7 years ago

@zverok @v0dro : PR is ready for yet another review. :smile:

zverok commented 7 years ago

Well, now when the code has decent quality, I have more conceptual question. Most of the time, we are trying to mimic/be close to Ruby's standard classes. Now, if you'll look at Enumerable's docs, you'll find

@v0dro, don't you think we should repeat this behavior? It looks somewhat reasonable for me.

athityakumar commented 7 years ago

Sure. I have my mid-semester exams going on currently (till 25th Feb) - I'll push the changes after that, and let you know for the next PR review. :smile:

v0dro commented 7 years ago

@athityakumar I've been monitoring your PRs and you are consistently requiring a lot of spoon feeding on part of @zverok to push you in the right direction.

If you want to apply as a GSOC student, we expect you to be an independent programmer and not make mistakes like renaming variables to something unreadable because of a Rubocop offense.

GSOC is for students who can work independently and turn for guidance only when they cannot figure out how to solve a problem. Helping you write better code after every other commit you push to a PR is not a mentor's responsibility. Just because @zverok is helpful does not mean he has a lot of time. He is not your professor. All of us are volunteers and do this in our spare time. We don't have time to push you in the right direction after every other commit.

Carefully review your code yourself before you push it to a PR next time, or if its a WIP, mention so in a comment.

athityakumar commented 7 years ago

If you want to apply as a GSOC student, we expect you to be an independent programmer and not make mistakes like renaming variables to something unreadable because of a Rubocop offense. Carefully review your code yourself before you push it to a PR next time, or if its a WIP, mention so in a comment.

Sure, I usually @mention all the reviewers when the PR is a ready for a review. That commit (in which renaming variables to something unreadable because of a Rubocop offense) was an intermediate one - which I pushed to check for build reports on different ruby versions. And I accept that it was a bad call on my part to make the code unreadable - but this is something that happens very rarely.

All of us are volunteers and do this in our spare time. We don't have time to push you in the right direction after every other commit.

Acknowledged.

GSOC is for students who can work independently and turn for guidance only when they cannot figure out how to solve a problem. Helping you write better code after every other commit you push to a PR is not a mentor's responsibility. Just because @zverok is helpful does not mean he has a lot of time.

I thought I was building a better rapport with the mentor, but I guess I didn't see it from the mentor's perspective - sorry for taking your time and thanks for helping, @zverok.

I'll make sure to review the code before pushing to the PR next time on wards. Thanks for sharing your opinion, @v0dro. :+1:

v0dro commented 7 years ago

Great! I appreciate that you're taking feedback positively.

zverok commented 7 years ago

@athityakumar well I acknowledge I am forcing myself into "intermediary" commits; though, I somehow feel obliged to, just to make sure things go the right way. For me, the problem I can identify is your PRs is that you sometimes don't spend enough time to investigate "how the similar things are already done in daru" or "how this kind of things are typically done in good Ruby code". I don't think it is a fatal problem though, yet I still suggest to approach things with "something like that was probably done several times, how they are usually doing it?" mindset. It is the only way to became an author of state-of-the-art solutions once ;)

athityakumar commented 7 years ago

@v0dro @zverok - I require a small clarification. The ruby doc specifies that max and min take a comparative block (like |a,b| a <=> b) whereas max_by and min_by take a single object in block (like |x| x.size).

Currently in the PR, min and max functions support the single object block rather than comparative block. That is, our min and max currently work like the inbuilt min_by and max_by. Shouldn't this be changed accordingly to make these functions in Daru::Vector similar to that of Ruby's inbuilt enum?

That is, Daru::Vector#max should be similar to Array#max, and Daru::Vector#max_by should be similar to Array#max_by right?

zverok commented 7 years ago

The ruby doc specifies that max and min take a comparative block (like |a,b| a <=> b) whereas max_by and min_by take a single object in block (like |x| x.size).

Well, I pointed at that fact above. Yet I have somewhat ambivalent feelings about this. As far as I can remember, Ruby's max vs max_by is "historical" (first it was only max, then somebody invented max_by). I have no clue if anybody uses max with block, as max_by always more superior/convenient.

So, I can find good arguments for both ideas:

  1. make the interface exactly as Array's
  2. cleanup it a bit, having only max with 1-arg block (and probably max_by as just its alias, for those who will expect it to exist).

Currently, I like (2) a bit more (that's why I haven't forced max/max_by distinction, though pointed on it).

@v0dro @lokeshh WDYT?

lokeshh commented 7 years ago

It's a hard choice. I would choose first option.

On Mar 6, 2017 10:47 PM, "Victor Shepelev" notifications@github.com wrote:

The ruby doc specifies that max and min take a comparative block (like |a,b| a <=> b) whereas max_by and min_by take a single object in block (like |x| x.size).

Well, I pointed at that fact above https://github.com/SciRuby/daru/pull/303#issuecomment-280871341. Yet I have somewhat ambivalent feelings about this. As far as I can remember, Ruby's max vs max_by is "historical" (first it was only max, then somebody invented max_by). I have no clue if anybody uses max with block, as max_by always more superior/convenient.

So, I can find good arguments for both ideas:

  1. make the interface exactly as Array's
  2. cleanup it a bit, having only max with 1-arg block (and probably max_by as just its alias, for those who will expect it to exist).

Currently, I like (2) a bit more (that's why I haven't forced max/max_by distinction, though pointed on it).

@v0dro https://github.com/v0dro @lokeshh https://github.com/lokeshh WDYT?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SciRuby/daru/pull/303#issuecomment-284465729, or mute the thread https://github.com/notifications/unsubscribe-auth/AHUmVCUH7e5PREHCJTYEgFw7KPd29rZ-ks5rjD-sgaJpZM4L4VrC .

athityakumar commented 7 years ago

I also slightly prefer the option (1) over (2), as it eliminates any possible confusions to the user, about what the functions max and max_by do.

But I was also thinking if we could use max to be compatible with both types of blocks by knowing the number of variables passed on (with block.parameters). Source : http://stackoverflow.com/questions/31812486/check-how-many-parameters-block-being-passed-has-and-then-invoke-yield-with-dif

This way, max will support both {|x| x.size} (by calling Array#max_by) and {|a,b| b.size <=> a.size} (by calling Array#max) - and max_by can be aliased to max.

zverok commented 7 years ago

I was thinking if we could use max to be compatible with both types of blocks by knowing the number of variables passed

OK, if you are sure, let's go this way. May be the most reasonable, while not the most easy one.

athityakumar commented 7 years ago

@zverok - I've amended the changes as suggested above. Functions Daru:Vector#max, Daru:Vector#min, Daru:Vector#index_of_max and Daru:Vector#index_of_min now can take both comparative as well as object blocks. Functions Daru:Vector#max_by, Daru:Vector#min_by, Daru:Vector#index_of_max_by, Daru:Vector#index_of_min_by have been aliased to the above mentioned functions, respectively.

athityakumar commented 7 years ago

@zverok @v0dro - I've re-wrote the spec descriptions, as suggested above.

zverok commented 7 years ago

@v0dro I think we can merge it, WDYT?

athityakumar commented 7 years ago

Ping @v0dro