Open matthewmturner opened 2 years ago
I think it might be possible; a good portion of the module doesn't require PyArrow. The only things that do are UDFs, UDAFs, and the parts of the Dataframe API that return PyArrow data structures (like collect()
, and schema()
). Does a datafusion-python
without those features sound appealing?
Cool - that was what it looked like to me as well from my scan of the code. IMHO in the medium term it would be nice to have pyarrow as an optional feature. I think that datafusion should have some improvements on the IO front though before enabling this (im looking into / working on writing capabilities https://github.com/apache/arrow-datafusion/issues/1777). Right now I think pyarrow has more functionality there which is useful.
I feel odd even asking this - but is it possible to make enhancements so that
datafusion-python
can be used withoutpyarrow
?pyarrow
is fantastic and I already use it, but, it is fairly large which makes it somewhat painful to deploy for some serverless use cases (such as on AWS Lambda). If I am able to do everything I need indatafusion
is there a need forpyarrow
? I confess I'm not very familiar with the interface between rust / datafusion and python / arrow so hopefully this isnt too stupid of a question.thx!