apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.51k stars 3.53k forks source link

[R] data.table syntax/API for arrow (R package) #39822

Open lucasmation opened 9 months ago

lucasmation commented 9 months ago

Describe the enhancement requested

Could arrow could support R data.table syntax (someting like DT[filter,action,grouping] on an arrow table) ?

Maybe this is too niche, but I find myself paying a high cost for collecting large amounts of data via arrow, for the benefit of the expressiveness of the data.table. Any chance someone could develop this API/syntax. Perhaps this would be a separate package, arrowDT. Has anyone worked on this?

Component(s)

R

thisisnic commented 8 months ago

It should be possible to do this - the dplyr functionality in the package works by transforming dplyr syntax into Arrow Expressions, so it'd need the same to be done with data.table syntax. I don't think this is something which would make sense to have in the arrow R package, but a separate package would be cool. I'm not aware of any existing attempts to make this happen, though it'd be nice to see it happen from an interoperability point-of-view!

Happy to advise as needed if someone does it take it on.