Open asfimport opened 4 years ago
Amol Umbarkar / @mindhash: Response from Wes: thanks for pointing that out. Such a heuristic (observing compression ratios of stream messages) could be implemented at some point so that compression could be toggled off mid-stream if it doesn't seem to be helping. Feel free to open a JIRA issue about this
I just opened https://issues.apache.org/jira/browse/ARROW-8823 since we don't track "what the uncompressed size would have been" without compression turned on.
Antoine Pitrou / @pitrou: One limitation is that compression is enabled for entire record batches, but it's quite conceivable that some fields or even individual buffers would compress very well, but others not.
cc @emkornfield @lidavidm
Dask seems to be selectively do compression if it is found to be useful. They sort of pick 10kb of sample upfront to calculate compression and if the results are good then the whole batch is compressed. This seems to save de-compression effort on receiver side. Please take a look at https://blog.dask.org/2016/04/14/dask-distributed-optimizing-protocol#problem-3-unwanted-compression Thought this could be relevant to arrow batch transfers as well.
Reporter: Amol Umbarkar / @mindhash
Note: This issue was originally created as ARROW-8845. Please see the migration documentation for further details.