apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
355 stars 70 forks source link

Add ability to consume query results incrementally #607

Open judahrand opened 7 months ago

judahrand commented 7 months ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently the DataFrame.collect() returns a list of all of the buffered RecordBatches. This is often not desirable as a user may, for example, want to write the result out to disk as it is materialized to save memory.

Describe the solution you'd like

It would be great to have a to_arrow_batches() method which returned a RecordBatchReader which deferred the execution of the batches until they are requested from the RecordBatchReader.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

judahrand commented 7 months ago

This may actually be a duplicate of #12