SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.03k stars 139 forks source link

Inheriting from Daru::DataFrame? #352

Closed info-rchitect closed 7 years ago

info-rchitect commented 7 years ago

Hi,

I want to inherit from Daru::DataFrame to create a generic Dataset class. The limitation (could be my knowledge for sure) is currently that I can't seem to figure out how to pass options via super to ask the DataFrame to instantiate via the 'from_csv' or 'row' methods.

require 'daru'
class Dataset < Daru::DataFrame

  def initialize(file, options = {})
    options = {
      name: 'daru'
    }.update(options)
    super({}, init_method: :from_csv, file: file, name: options[:name])
  end
end

I can workaround this by forcing users to instantiate Datasets via a wrapper method but having this ability via super would make the code so much cleaner.

thx

v0dro commented 7 years ago

Can't you directly use the class method from_csv inside initialize? It is not possible to include CSV reading from DataFrame#intialize since the method is already performing too many functions and adding more functionality like this would add confusion to the API.

zverok commented 7 years ago

To be honest, I am not sure that you are doing the "right" thing from architectural point of view. Daru::DataFrame will hardly make a good base class for anything except "dataframe with a few more features", it was never the point of the design.

info-rchitect commented 7 years ago

Hi,

Thanks for the inputs. Regarding using Daru::DataFrame as a base class, it is needed so "business logic" and methods can be applied in a standardized manner. A simple example is creating a JMP like 'split' method using the pivot_table method or doing proprietary calculations on the dataframe. I just ended up allowing users to pass in a source argument that can be any type of supported files, a hash, an array of arrays, etc. Loving the library and I hope to contribute soon.

regards

zverok commented 7 years ago

Regarding using Daru::DataFrame as a base class, it is needed so "business logic" and methods can be applied in a standardized manner.

I am still not 100% sure about use case, but probably one of your options is having DataFrame as an instance variable of your object, and use Forwardable module to delegate some of methods to it.

DataFrame itself is not ready for being base class, because, for example any of its methods that return DataFrame, will still return them, not an instance of child class (and Vector, not some descendant class).

So, may I close the issue?..

info-rchitect commented 7 years ago

yes you may close the issue, thanks for the discussion. My app still uses DataFrame as a base class, I just intercept the results and manage what to do with the resulting DF. There is some performance overhead, but most of the time I am doing lots of transforms (pivots, concats, joins, etc.) and only need to convert the last DF into my object.