apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.49k stars 1.02k forks source link

Real-time streaming support #10895

Open maronavenue opened 2 weeks ago

maronavenue commented 2 weeks ago

Is your feature request related to a problem or challenge?

Good day. Are there any plans to support real-time streaming of Arrow record batches? The use case I imagine would be that we could setup an Arrow Flight client on our side that will receive and process the stream from the Session Context upon executing the query. Otherwise, are there any recommendations that could enable this workflow?

Thanks as always.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 2 weeks ago

Hi @maronavenue --

I think several people use DataFusion for this - I think https://docs.rs/datafusion/latest/datafusion/datasource/streaming/struct.StreamingTable.html gives

@matthewmturner recently added an example in https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/file_stream_provider.rs in https://github.com/apache/datafusion/pull/10600

I think the documentation could still be improved, so if you have any time to help out that would be most appreciated 🙏