SciRuby / daru-io

daru-io is a plugin gem to the existing daru gem, which aims to add support to Importing DataFrames from / Exporting DataFrames to multiple formats.
http://www.rubydoc.info/github/athityakumar/daru-io/master/
MIT License
24 stars 9 forks source link

Calling read-write methods from daru to daru-io #29

Closed athityakumar closed 7 years ago

athityakumar commented 7 years ago

Currently, daru-io supports only from_format and to_format methods, with Daru::DataFrame.from_format(...) redirecting to Daru::IO::Importers::Format.new(...).call. Similar case with exporters.

As per discussion on https://github.com/SciRuby/daru/issues/280, from_format, read_format, to_format and write_format methods are to be supported by all IO modules. Please check whether this would be good enough. Of course, feel free to suggest better calling methods (we may have to re-factor).

Daru::DataFrame.from_format(...) -> Daru::IO::Importers::Format.new(...).from Daru::DataFrame.read_format(...) -> Daru::IO::Importers::Format.new(...).read

Ping @zverok @v0dro @lokeshh

v0dro commented 7 years ago

Works for me.

zverok commented 7 years ago

Looks good for me too :+1:

athityakumar commented 7 years ago

@zverok @v0dro - Some clarifications are required regarding the addition of to_s method that returns a string that can directly be written to a file, into daru-io exporters.

athityakumar commented 7 years ago

Ping @zverok @v0dro - One final clarification please. 😅 ^

v0dro commented 7 years ago

The user won't really call Daru::IO::Exporters::Format#initialize directly right? I think kwargs should work fine.

Should specific to_s methods really be added to each exporter? Or, would adding a to_s method to Exporters::Base that does write_format on a tempfile and do File.read(tempfile.path) provide a more generic and DRYier solution?

Is string conversion generic enough for each converter that you can put it inside Base? If yes, you can go ahead with that.

zverok commented 7 years ago

As the to_format and the to_s methods don't take path argument and yet have to be incorporated into the same Daru::IO::Exporters::Format#initialize method, would it be better to have ALL arguments as keyword arguments?

I believe that it rather means that path is a good candidate for call argument. Like this:

exporter = Daru::IO::Exporter::JSON.new(df)
exporter.to_s
exporter.write(path)

...or something like that, WDYT?

Or, would adding a to_s method to Exporters::Base that does write_format on a tempfile and do File.read(tempfile.path) provide a more generic and DRYier solution?

It sounds a bit head-over-heels for me. I'd rather say that default implementation of write_format should use File.write(to_s)

athityakumar commented 7 years ago

Thanks for sharing your thoughts, @v0dro & @zverok.

I'd rather say that default implementation of write_format should use File.write(to_s)

Acknowledged. I was thinking that the usage of tempfiles might "get the job done" with lesser modifications to existing codebase. But, having write_format to use File.write(path, to_s) definitely seems more cleaner. :+1:

Edit: The RData & RDS Exporters have no way of wrapping their to_s via StringIO objects and directly writes into a .rds / .rdata file with saveRDS() / save() R function calls. I think that computing to_s by reading write(tempfile.path) could again handle this better. 😄

I believe that it rather means that path is a good candidate for call argument.

I thought the same too, but wouldn't this make it difficult to automatically monkey-patch daru? Like, linking df.write_excel(path, args) to Daru::IO::Exporters::Excel.new(args).write(path) is kind of discrete and not pretty much generic. For example, write_sql writes to DBI and not to a file. So, all arguments passed by daru to daru-io can't really be distinguished as first argument is path or not, right? This is why I was thinking of maybe keyword arguments (in both daru & daru-io) would do the job.

zverok commented 7 years ago

Well, about the second question... Maybe (I am absolutely not sure) we can look at it at this angle:

exporter = Daru::IO::Exporters::Whatever.new(df) # reusable one!
exporter.write(all, other, args)
exporter.write(absolutely, different, args)
exporter.write() # uses default args

This way df.write_format(any, args) just will do Exporter.new(self).write(args).

Or just proceed your way :)

athityakumar commented 7 years ago

@zverok - Can you please re-read the first point? I added an edit regarding how R Exporters pose a difficulty regarding the proposed approach (write = File.write(path, to_s)) and how the alternative approach (to = File.read(tempfile.path)) seems more of a generic solution.

Regarding the second point, I was re-thinking about whether we can have an auto-link logic like this:

case function.to_s
when /\Ato_.*_string\Z/
  define_method(function) { |*args, &io_block| instance.new(self, *args, &io_block).to_s }
when /\Ato_/
  define_method(function) { |*args, &io_block| instance.new(self, *args, &io_block).to }  
when /\Awrite_/
  define_method(function) { |*args, &io_block| instance.new(self, *args[1..-1], &io_block).write(*args[0]) }
when /\Afrom_/
  define_singleton_method(function) { |*args, &io_block| instance.new(*args[1..-1], &io_block).from(*args[0]) }
when /\Aread_/
  define_singleton_method(function) { |*args, &io_block| instance.new(*args[1..-1], &io_block).read(*args[0]) }
else
  raise ArgumentError, 'Invalid function name given to monkey-patch into Daru::DataFrame.'
end

This way, we have normal Daru::DataFrame method calls and also daru-io calls like:

importer = Daru::IO::Importers::Format.new(opts, &block)
exporter = Daru::IO::Exporters::Format.new(df, opts, &block)

exporter.write(path) #! Linked to df.write(path, opts, &block)
exporter.to_s #! Linked to df.to_s(opts, &block)
exporter.to #! Linked to df.to_format(opts, &block)

importer.read(path) #! Linked to Daru::DataFrame.read_format(path, opts, &block)
importer.from(instance) #! Linked to Daru::DataFrame.from_format(instance, opts, &block)

I think we may have to discuss (not now, but when PR review happens for this) about discrete cases like whether database related importers should have a read_format method and more of such discrete details.

athityakumar commented 7 years ago

Ping @zverok - Please have a look at the above, in case you missed it. 😄

zverok commented 7 years ago

@athityakumar I feel like being guilty for overdiscussing it :) Just implement it the way that makes the most sense for you -- we always can refactor later, you know ;)

athityakumar commented 7 years ago

Acknowledged, opened PR #52 (WIP) for this.

athityakumar commented 7 years ago

Merged PR #52 as per this discussion and opened #53 for documenting possible enhancements.