SciRuby / daru-io

daru-io is a plugin gem to the existing daru gem, which aims to add support to Importing DataFrames from / Exporting DataFrames to multiple formats.
http://www.rubydoc.info/github/athityakumar/daru-io/master/
MIT License
25 stars 9 forks source link

Adds from_redis importer #18

Closed athityakumar closed 7 years ago

athityakumar commented 7 years ago

This currently is set to follow modular approach as per proposal (with the Helper method), and hasn't yet been updated to the IO class inheritance. A couple of spec tweaks are yet to be done, and YARD Doc is remaining to be done by tomorrow.

athityakumar commented 7 years ago

@zverok - I think it'd be better to make meta-programming related changes separately & not on this PR (maybe after #16 gets merged?). Please review the changes when you're free. 😄

athityakumar commented 7 years ago

I agree that offset doesn't provide much feature in a conventional sense due to the random order of Redis, but the intent behind offset & count was to provide the user with dataframes that contain a specific chunk of data. Hence, for a use-case of (say) 10000+ keys, the user can query for dataframes for chunks of 1000, with something like -

chunked_df1 = Daru::IO::Importers::Redis.new(..., offset: 0, count: 1000).call
chunked_df2 = Daru::IO::Importers::Redis.new(..., offset: 1000, count: 1000).call
...

And now that I think of it, I feel it would be better to have an option chunk: true / false. When true, we can return an array of chunked dataframes that the user can probably use with Parallel for parallel similar computations on similar dfs. Something like,

all_dfs = Daru::IO::Importers::Redis.new(..., count: -1, chunk: true, chunk_size: 500).call
# count & chunk_size kept separate as count refers to total number of records (-1 mens ALL) required,
# and chunk_size is number of records required / dataframe (if chunk is set as true)
=> Returns Array of DataFrames, each containing 500 elements 

What do you think about having this feature of parallel?

athityakumar commented 7 years ago

@zverok - Just wanted to remind you about the above discussion. No issues with removing offset, but should chunk feature be provided? 😄

zverok commented 7 years ago

No issues with removing offset, but should chunk feature be provided? :smile:

Sorry for long waiting for the response, but let's just YAGNI that. I don't believe the feature is really necessary, to be honest. And trying to guess all probable future usecases is less important than having solid and simple foundation for Daru users to experiment with (and understand what they really DO need)

athityakumar commented 7 years ago

@zverok - Acknowledged. I've reverted the offset part. 👍