Closed athityakumar closed 7 years ago
@zverok - I think it'd be better to make meta-programming related changes separately & not on this PR (maybe after #16 gets merged?). Please review the changes when you're free. 😄
I agree that offset
doesn't provide much feature in a conventional sense due to the random order of Redis, but the intent behind offset
& count
was to provide the user with dataframes that contain a specific chunk of data. Hence, for a use-case of (say) 10000+ keys, the user can query for dataframes for chunks of 1000, with something like -
chunked_df1 = Daru::IO::Importers::Redis.new(..., offset: 0, count: 1000).call
chunked_df2 = Daru::IO::Importers::Redis.new(..., offset: 1000, count: 1000).call
...
And now that I think of it, I feel it would be better to have an option chunk: true / false
. When true, we can return an array of chunked dataframes that the user can probably use with Parallel
for parallel similar computations on similar dfs. Something like,
all_dfs = Daru::IO::Importers::Redis.new(..., count: -1, chunk: true, chunk_size: 500).call
# count & chunk_size kept separate as count refers to total number of records (-1 mens ALL) required,
# and chunk_size is number of records required / dataframe (if chunk is set as true)
=> Returns Array of DataFrames, each containing 500 elements
What do you think about having this feature of parallel?
@zverok - Just wanted to remind you about the above discussion. No issues with removing offset
, but should chunk
feature be provided? 😄
No issues with removing offset, but should chunk feature be provided? :smile:
Sorry for long waiting for the response, but let's just YAGNI that. I don't believe the feature is really necessary, to be honest. And trying to guess all probable future usecases is less important than having solid and simple foundation for Daru users to experiment with (and understand what they really DO need)
@zverok - Acknowledged. I've reverted the offset part. 👍
This currently is set to follow modular approach as per proposal (with the Helper method), and hasn't yet been updated to the IO class inheritance. A couple of spec tweaks are yet to be done, and YARD Doc is remaining to be done by tomorrow.