Closed asfimport closed 5 years ago
Lee June Woo: Hello,
May I ask you simple question about the improvement? I think that It seem to be more efficient to split the pandas dataframe base on "dt" column before converting dataframe to arrow table.
Would you have any plan to implement group-by operation of arrow table or improve write_to_dataset function?
Wes McKinney / @wesm: We do plan to implement group-by operations on Arrow tables eventually. If you would like to propose some improvements in the meantime, please go right ahead
Joris Van den Bossche / @jorisvandenbossche: This seems a duplicate of ARROW-2628, so closing this issue (both are about the (memory) performance issues due to the usage of pandas' groupby functionality). I will update the other issue with some of the discussion in the closed PR.
Wes McKinney / @wesm: Thanks. I hope to see the group-splitting implemented natively against Arrow tables at some point
Hello,
Posting this from github (master @wesm asked for it :) )
https://github.com/apache/arrow/issues/2138
this works but is inefficient memory-wise. The arrow table is a copy of the large pandas daframe and quickly saturates the RAM.
Thanks!
Reporter: Olaf / @randomgambit
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-2709. Please see the migration documentation for further details.