Closed athityakumar closed 7 years ago
@lokeshh @zverok @v0dro - Please have a look at this and share your thoughts. 😄
Regarding (1), if we don't want to force the user to send keyword arguments even for variables like path
/ connection
, we can either have something like
module Daru
module IO
module Importers
class CSV < Importer
def initialize(path, col_sep: ',', **args)
super(path: path, col_sep: col_sep, **args)
end
def call
# do importer specific stuff here
end
end
end
end
end
OR
module Daru
module IO
module Importers
class CSV < Importer
def initialize(path, **args)
super(path: path, **args)
check
end
def check
@col_sep ||= ','
end
def call
# do importer specific stuff here
end
end
end
end
end
I think its always a good idea to automate things. So I really love this idea. But at the same time I would caution you that if its get too complicated in the future and you find the complexities outweigh the benefits feel free to fall back.
Sure, thanks @lokeshh. I've updated PR #16 with metaprogramming, please have a look. 😄
I have several concerns.
**options
and store it in @options
. Automatic assingment of any passed variables is evil-ish.CSV
importer should automatically lead to DataFrame#from_csv
method existance), yet approach to achieve the goal is not optimal. Code becomes too coupled, and relies on a fact that all importer classes defined at some arbitrary point in exectuion (otherwise your register_all_importers
would not find the importer). This should be much better:class Importers::Base
def self.inherited(child_class)
Daru::DataFrame.register_importer(guessed_importer_name, child_class)
end
end
This way, you can have any third-party gem defining any Some::Obcure::Namespace::MyImporter
, and still having dicsovered by the mechanism.
The second part of the suggestion is that Daru::DataFrame.register_importer
(or better Daru::Importers.register
) could be public AND accept either instance of Importer::Base
, or anything with call
method, or block. This way you can do also this kind of things:
Daru::Importers.register :from_very_simple_format do |path|
raw = File.read(...).split(...).map(...)
DataFrame.new(raw)
end
WDYT?
Thanks for sharing your ideas, @zverok. 😄
(1) The instance variables were indeed initialized manually till last week, but it seemed like repetition. Rather, auto-initializing with super(binding)
seems to simplify this process for subsequent IO modules too. I don't understand (yet) why this might turn out to be evil-ish. Am I missing something?
(2) The register_io_module
method is used to link each IO module to corresponding Daru::DataFrame
method. As daru-io
is to support partial requires too, only register_io_module
is used so far by each IO module and register_all_io_modules
method IS NOT USED currently. However, if we're to add daru-io
as a dependency to daru
at a later point (without partial requires), then register_all_io_modules
would come in handy.
Passing block to the register
method seems like a good idea. 👍
(1) The instance variables were indeed initialized manually till last week, but it seemed like repetition. Rather, auto-initializing with
super(binding)
seems to simplify this process for subsequent IO modules too. I don't understand (yet) why this might turn out to be evil-ish. Am I missing something?
Well, metaprogramming is like kungfu. Real masters use it really rarely ;) The concerns current solution produces are:
binding
should be used only in special cases (to be honest, I don't understand why you need it at all), and "I am tired of initializing local vars" is NOT that case definitelyOpenStruct
and call it a day, but I don't believe it brings any value at all.In total, this is kind of smart solution that "masks" architectural problems. If you see that all importers repeat some code, you can: a) do wiser base class(es), with more "defaults" (like "ok, it can import from file or string, so it has heuristics to guess which is the case"), or b) you can "mask" it with metaprogramming, and forget to think
(a) is typically superior approach ;)
(2) The
register_io_module
method is used to link each IO module to correspondingDaru::DataFrame
method. Asdaru-io
is to support partial requires too, onlyregister_io_module
is used so far by each IO module andregister_all_io_modules
method IS NOT USED currently. However, if we're to adddaru-io
as a dependency todaru
at a later point (without partial requires), thenregister_all_io_modules
would come in handy.
I believe that register_all
would not be necessary if you'll take my def self.inherited
advice.
Acknowledged. auto-init
although smart probably seems like an overkill at this point of time, when number of arguments aren't that huge. So, we can definitely have manual init for now. I'll soon revert the auto-init part.
However, we probably might require auto-init
in the future though, when all IO modules have dozens of arguments. Let me just share this below code snippet for future references. 😄
#! lib/daru/io/base.rb
#! Base class common to importers & exporters to auto-init arguments
module Daru
module IO
class Base
def initialize(binding)
args = method(__method__).parameters.map do |_, name|
[name, binding.local_variable_get(name.to_s)]
end.to_h
args.each { |k, v| instance_variable_set("@#{k}", v) }
end
end
end
end
#! lib/daru/io/importers/base.rb
#! Importer base class with generic helper functions,
#! that inherits from the above base class.
module Daru
module IO
module Importers
class Base < Base
def guess_parse
...
end
end
end
end
end
#! lib/daru/io/importers/format.rb
#! Specific Importer class
module Daru
module IO
module Importers
#! Inherits from importer base class
class Format < Base
def initialize(connection={}, *keys, match: nil, count: nil)
super(binding)
end
def call
keys = ...
vals = ...
guess_parse keys, vals
end
end
end
end
end
After having gone through a bit of meta-programming with Ruby, I feel that we can use this concept for atleast 2 purposes -
I'm positive about both of these changes. I'd like to know if I've left out any other place(s) where metaprogramming can be used, or if there's any problem with this methodology.