SciRuby / daru-io

daru-io is a plugin gem to the existing daru gem, which aims to add support to Importing DataFrames from / Exporting DataFrames to multiple formats.
http://www.rubydoc.info/github/athityakumar/daru-io/master/
MIT License
25 stars 9 forks source link

Metaprogramming to automate things #19

Closed athityakumar closed 7 years ago

athityakumar commented 7 years ago

After having gone through a bit of meta-programming with Ruby, I feel that we can use this concept for atleast 2 purposes -

#! Should this class be called Base?
module Daru
  module IO
    module Importers
      class Importer
        def initialize(**args)
          args.each do |k,v|
            instance_variable_set("@#{k}", v)
            define_singleton_method(k) { instance_variable_get("@#{k}") }
          end
        end
      end
    end
  end
end

#! lib/daru/io/importers/format.rb
module Daru
  module IO
    module Importers
      class Format < Importer
        def call
          # do importer specific stuff here
        end
      end
    end
  end
end
#! Use case (note that ALL arguments have to be keyword arguments)
df = Daru::IO::Importers::Format.new(path: '/path/to/format' or connection: 'connection', other keyword arguments).call
#! daru/io/importers/linkages.rb
module Daru
  class DataFrame
    class << self
      def register_importer(function, instance)
        define_singleton_method(function) { |*args| instance.new(*args).call }
      end

      def register_all_importers
        importers = Daru::IO::Importers
        klasses   = importers.constants.select {|c| importers.const_get(c).is_a? Class}
        klasses.each do |klass|
          method_name = "from_#{klass.downcase}".to_sym
          register_importer method_name, Object.const_get("Daru::IO::Importers::#{klass}")
        end     
      end
    end
  end
end

Daru::DataFrame.register_all_importers
# Use Daru::DataFrame.register_importer for partial requires, yay!
# Note that for libraries like rcsv, the call changes to Daru::DataFrame.from_rcsv(...)

I'm positive about both of these changes. I'd like to know if I've left out any other place(s) where metaprogramming can be used, or if there's any problem with this methodology.

athityakumar commented 7 years ago

@lokeshh @zverok @v0dro - Please have a look at this and share your thoughts. 😄

athityakumar commented 7 years ago

Regarding (1), if we don't want to force the user to send keyword arguments even for variables like path/ connection , we can either have something like

module Daru
  module IO
    module Importers
      class CSV < Importer
        def initialize(path, col_sep: ',', **args)
          super(path: path, col_sep: col_sep, **args)
        end
        def call
           # do importer specific stuff here
        end
      end
    end
  end
end

OR

module Daru
  module IO
    module Importers
      class CSV < Importer
        def initialize(path, **args)
          super(path: path, **args)
          check
        end
        def check
          @col_sep ||= ','
        end
        def call
          # do importer specific stuff here
        end
      end
    end
  end
end
lokeshh commented 7 years ago

I think its always a good idea to automate things. So I really love this idea. But at the same time I would caution you that if its get too complicated in the future and you find the complexities outweigh the benefits feel free to fall back.

athityakumar commented 7 years ago

Sure, thanks @lokeshh. I've updated PR #16 with metaprogramming, please have a look. 😄

zverok commented 7 years ago

I have several concerns.

  1. About the params for initialize: I believe that it is a case of "premature abstraction", which could lead to unnecessary complexity. I believe that either (or both) of two simple approaches could work: either assing instance vars manually, or obtain an **options and store it in @options. Automatic assingment of any passed variables is evil-ish.
  2. Here metaprogramming is justified (e.g. the fact that existance of CSV importer should automatically lead to DataFrame#from_csv method existance), yet approach to achieve the goal is not optimal. Code becomes too coupled, and relies on a fact that all importer classes defined at some arbitrary point in exectuion (otherwise your register_all_importers would not find the importer). This should be much better:
class Importers::Base
  def self.inherited(child_class)
    Daru::DataFrame.register_importer(guessed_importer_name, child_class)
  end
end

This way, you can have any third-party gem defining any Some::Obcure::Namespace::MyImporter, and still having dicsovered by the mechanism.

The second part of the suggestion is that Daru::DataFrame.register_importer (or better Daru::Importers.register) could be public AND accept either instance of Importer::Base, or anything with call method, or block. This way you can do also this kind of things:

Daru::Importers.register :from_very_simple_format do |path|
  raw = File.read(...).split(...).map(...)
  DataFrame.new(raw)
end

WDYT?

athityakumar commented 7 years ago

Thanks for sharing your ideas, @zverok. 😄

(1) The instance variables were indeed initialized manually till last week, but it seemed like repetition. Rather, auto-initializing with super(binding) seems to simplify this process for subsequent IO modules too. I don't understand (yet) why this might turn out to be evil-ish. Am I missing something?

(2) The register_io_module method is used to link each IO module to corresponding Daru::DataFrame method. As daru-io is to support partial requires too, only register_io_module is used so far by each IO module and register_all_io_modules method IS NOT USED currently. However, if we're to add daru-io as a dependency to daru at a later point (without partial requires), then register_all_io_modules would come in handy.

Passing block to the register method seems like a good idea. 👍

zverok commented 7 years ago

(1) The instance variables were indeed initialized manually till last week, but it seemed like repetition. Rather, auto-initializing with super(binding) seems to simplify this process for subsequent IO modules too. I don't understand (yet) why this might turn out to be evil-ish. Am I missing something?

Well, metaprogramming is like kungfu. Real masters use it really rarely ;) The concerns current solution produces are:

In total, this is kind of smart solution that "masks" architectural problems. If you see that all importers repeat some code, you can: a) do wiser base class(es), with more "defaults" (like "ok, it can import from file or string, so it has heuristics to guess which is the case"), or b) you can "mask" it with metaprogramming, and forget to think

(a) is typically superior approach ;)

(2) The register_io_module method is used to link each IO module to corresponding Daru::DataFrame method. As daru-io is to support partial requires too, only register_io_module is used so far by each IO module and register_all_io_modules method IS NOT USED currently. However, if we're to add daru-io as a dependency to daru at a later point (without partial requires), then register_all_io_modules would come in handy.

I believe that register_all would not be necessary if you'll take my def self.inherited advice.

athityakumar commented 7 years ago

Acknowledged. auto-init although smart probably seems like an overkill at this point of time, when number of arguments aren't that huge. So, we can definitely have manual init for now. I'll soon revert the auto-init part.

However, we probably might require auto-init in the future though, when all IO modules have dozens of arguments. Let me just share this below code snippet for future references. 😄

#! lib/daru/io/base.rb
#! Base class common to importers & exporters to auto-init arguments
module Daru
  module IO
    class Base
      def initialize(binding)
        args = method(__method__).parameters.map do |_, name|
          [name, binding.local_variable_get(name.to_s)]
        end.to_h

        args.each { |k, v| instance_variable_set("@#{k}", v) }
      end
    end
  end
end

#! lib/daru/io/importers/base.rb
#! Importer base class with generic helper functions,
#! that inherits from the above base class.
module Daru
  module IO
    module Importers
      class Base < Base
        def guess_parse
          ...
        end
      end
    end
  end
end

#! lib/daru/io/importers/format.rb
#! Specific Importer class
module Daru
  module IO
    module Importers
      #! Inherits from importer base class
      class Format < Base
        def initialize(connection={}, *keys, match: nil, count: nil)
          super(binding)
        end

        def call
          keys = ...
          vals = ...
          guess_parse keys, vals
        end
      end
    end
  end
end