SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.04k stars 139 forks source link

Deprecate use of reportbuilder in DataFrame summary #305

Closed ananyo2012 closed 7 years ago

ananyo2012 commented 7 years ago

Fixes #221

This replaces the use of reportbuilder for DataFrame summary and uses Daru DataFrame and Vector methods in place.

v0dro commented 7 years ago

ReportBuilder does much more than just the things that you've specified in your summary. Please install the reportbuilder gem, read the docs and see what it is capable of. Your PR should encapsulate all of that.

Here's a previous PR about this with some useful comments: https://github.com/SciRuby/daru/pull/288

ananyo2012 commented 7 years ago

@v0dro Can you explain what is an object type in a Daru vector with an example? It is used in this line.

ananyo2012 commented 7 years ago

Ok I see any non-numeric element is treated as an object. But if I create an vector with a non-numeric element like Daru::Vector.new([1, 2, "xyz"]) it shows an error if I try to get the summary. The error says

NoMethodError: undefined method `quo' for nil:NilClass
v0dro commented 7 years ago

Object is any kind of data that is not numeric or categorical. Basically all sorts of Ruby objects like strings, hashes, numbers, etc. are object.

You will not understand the full capabilities of reportbuilder if you use it through daru. You'll need to use it as a standalone gem first and pass it data in a format that is supported by reportbuilder.

See this: https://github.com/clbustos/reportbuilder

Specifically look at ReportBuilder::Table in the README. The idea is to mimic that functionality in daru.

ananyo2012 commented 7 years ago

@v0dro I read through the reportbuilder code and implemented a table method that works for this case. Can you have a look and give your reviews?

v0dro commented 7 years ago

Give me a day.

zverok commented 7 years ago

@ananyo2012 The big problem of taking this task is that current state (with use of reportbuilder) has very loose tests, they literally just check that "well, #summary outputs some string".

What I assume could be reasonable approach to the task is:

Currently, to tell whether what you did is a right thing, each of the reviewers should just checkout your branch and check the output with a bare eyes... Which is not the best approach possible.

I'll also review your actual code shortly.

ananyo2012 commented 7 years ago

@zverok Thanks for the review. I am working on the changes requested. Till now what I did was just replace the usage of reportbuilder with a reportbuilder-less approach without changing the output code itself following strictly the reportbuilder output conventions. Will work on restucturing the code now but I think I will follow the reportbuilder output format as its pretty reasonable.

And yes I will write detailed test for the same since this is a manual approach now and needs to have adequate tests.

ananyo2012 commented 7 years ago

@zverok @v0dro I have made the requested changes and added thorough tests. Have a look.

ananyo2012 commented 7 years ago

@zverok I have solved the rubocop offenses. Please have a look.

ananyo2012 commented 7 years ago

@zverok Ready for one more review.

ananyo2012 commented 7 years ago

@zverok I left some comments. Can you have a look?

ananyo2012 commented 7 years ago

@zverok Done with the table_formatters. Ready for another review!

ananyo2012 commented 7 years ago

@v0dro done!

zverok commented 7 years ago

(And don't forget to rebase from master, it has a conflict now.)

ananyo2012 commented 7 years ago

@zverok Made the changes and rebased. Please have a look.