Closed athityakumar closed 7 years ago
Works for me.
Looks good for me too :+1:
@zverok @v0dro - Some clarifications are required regarding the addition of to_s
method that returns a string that can directly be written to a file, into daru-io exporters.
As the to_format
and the to_s
methods don't take path
argument and yet have to be incorporated into the same Daru::IO::Exporters::Format#initialize
method, would it be better to have ALL arguments as keyword arguments?
Should specific to_s
methods really be added to each exporter? Or, would adding a to_s
method to Exporters::Base
that does write_format
on a tempfile and do File.read(tempfile.path)
provide a more generic and DRYier solution?
Ping @zverok @v0dro - One final clarification please. 😅 ^
The user won't really call Daru::IO::Exporters::Format#initialize
directly right? I think kwargs should work fine.
Should specific to_s methods really be added to each exporter? Or, would adding a to_s method to Exporters::Base that does write_format on a tempfile and do File.read(tempfile.path) provide a more generic and DRYier solution?
Is string conversion generic enough for each converter that you can put it inside Base
? If yes, you can go ahead with that.
As the to_format and the to_s methods don't take path argument and yet have to be incorporated into the same Daru::IO::Exporters::Format#initialize method, would it be better to have ALL arguments as keyword arguments?
I believe that it rather means that path
is a good candidate for call
argument. Like this:
exporter = Daru::IO::Exporter::JSON.new(df)
exporter.to_s
exporter.write(path)
...or something like that, WDYT?
Or, would adding a to_s method to Exporters::Base that does write_format on a tempfile and do File.read(tempfile.path) provide a more generic and DRYier solution?
It sounds a bit head-over-heels for me. I'd rather say that default implementation of write_format
should use File.write(to_s)
Thanks for sharing your thoughts, @v0dro & @zverok.
I'd rather say that default implementation of write_format should use File.write(to_s)
Acknowledged. I was thinking that the usage of tempfiles might "get the job done" with lesser modifications to existing codebase. But, having write_format
to use File.write(path, to_s)
definitely seems more cleaner. :+1:
Edit: The RData & RDS Exporters have no way of wrapping their to_s
via StringIO
objects and directly writes into a .rds / .rdata file with saveRDS()
/ save()
R function calls. I think that computing to_s
by reading write(tempfile.path)
could again handle this better. 😄
I believe that it rather means that path is a good candidate for call argument.
I thought the same too, but wouldn't this make it difficult to automatically monkey-patch daru? Like, linking df.write_excel(path, args)
to Daru::IO::Exporters::Excel.new(args).write(path)
is kind of discrete and not pretty much generic. For example, write_sql
writes to DBI and not to a file. So, all arguments passed by daru to daru-io can't really be distinguished as first argument is path or not, right? This is why I was thinking of maybe keyword arguments (in both daru & daru-io) would do the job.
Well, about the second question... Maybe (I am absolutely not sure) we can look at it at this angle:
exporter = Daru::IO::Exporters::Whatever.new(df) # reusable one!
exporter.write(all, other, args)
exporter.write(absolutely, different, args)
exporter.write() # uses default args
This way df.write_format(any, args)
just will do Exporter.new(self).write(args)
.
Or just proceed your way :)
@zverok - Can you please re-read the first point? I added an edit regarding how R Exporters pose a difficulty regarding the proposed approach (write = File.write(path, to_s)
) and how the alternative approach (to = File.read(tempfile.path)
) seems more of a generic solution.
Regarding the second point, I was re-thinking about whether we can have an auto-link logic like this:
case function.to_s
when /\Ato_.*_string\Z/
define_method(function) { |*args, &io_block| instance.new(self, *args, &io_block).to_s }
when /\Ato_/
define_method(function) { |*args, &io_block| instance.new(self, *args, &io_block).to }
when /\Awrite_/
define_method(function) { |*args, &io_block| instance.new(self, *args[1..-1], &io_block).write(*args[0]) }
when /\Afrom_/
define_singleton_method(function) { |*args, &io_block| instance.new(*args[1..-1], &io_block).from(*args[0]) }
when /\Aread_/
define_singleton_method(function) { |*args, &io_block| instance.new(*args[1..-1], &io_block).read(*args[0]) }
else
raise ArgumentError, 'Invalid function name given to monkey-patch into Daru::DataFrame.'
end
This way, we have normal Daru::DataFrame method calls and also daru-io calls like:
importer = Daru::IO::Importers::Format.new(opts, &block)
exporter = Daru::IO::Exporters::Format.new(df, opts, &block)
exporter.write(path) #! Linked to df.write(path, opts, &block)
exporter.to_s #! Linked to df.to_s(opts, &block)
exporter.to #! Linked to df.to_format(opts, &block)
importer.read(path) #! Linked to Daru::DataFrame.read_format(path, opts, &block)
importer.from(instance) #! Linked to Daru::DataFrame.from_format(instance, opts, &block)
I think we may have to discuss (not now, but when PR review happens for this) about discrete cases like whether database related importers should have a read_format method and more of such discrete details.
Ping @zverok - Please have a look at the above, in case you missed it. 😄
@athityakumar I feel like being guilty for overdiscussing it :) Just implement it the way that makes the most sense for you -- we always can refactor later, you know ;)
Acknowledged, opened PR #52 (WIP) for this.
Merged PR #52 as per this discussion and opened #53 for documenting possible enhancements.
Currently, daru-io supports only
from_format
andto_format
methods, withDaru::DataFrame.from_format(...)
redirecting toDaru::IO::Importers::Format.new(...).call
. Similar case with exporters.As per discussion on https://github.com/SciRuby/daru/issues/280,
from_format
,read_format
,to_format
andwrite_format
methods are to be supported by all IO modules. Please check whether this would be good enough. Of course, feel free to suggest better calling methods (we may have to re-factor).Daru::DataFrame.from_format(...)
->Daru::IO::Importers::Format.new(...).from
Daru::DataFrame.read_format(...)
->Daru::IO::Importers::Format.new(...).read
Ping @zverok @v0dro @lokeshh