Open ianmcook opened 7 months ago
@CurtHagenlocher asked:
Any thoughts on the "best" canonical way to return multiple record batches? Would that also be multipart/mixed or would it be better to avoid the delimiter problem and e.g. use an alternate Content-Type indicating that the response contains multiple streams?
By "multiple record batches" do you mean record batches with different schemas? (Or maybe they have the same schema but it's important to keep them logically separated in separate IPC streams?) If that's what you mean, then I think the two-step / indirect approach described here is probably what we should generally recommend.
Yes, sorry, separate result sets with potentially-different schemas. I think the scenario here is visualization in the browser where the JS-based UI sends a single request for multiple results, each of which is relatively small. Having to do this as described here would mean having to maintain state on the server across multiple requests.
Ah, I see.
IIUC we do not have any facilities in IPC, Flight, Flight SQL, or ADBC that encapsulate multiple different-schema streams into one logical unit. And I don't think we're super eager to create anything like that. So using whatever facilities HTTP provides seems like the way to go.
So I think a multipart/mixed response (as described in #40598) seems like probably the best way to do this if you can't maintain state on the server side. I think if you choose a sufficiently obscure delimiter, the delimiter problem is exceedingly unlikely to be a real problem in practice, but we should research this more to better understand the risks.
we do not have any facilities in IPC, Flight, Flight SQL, or ADBC that encapsulate multiple different-schema streams into one logical unit
but there are some issues requesting this in ADBC: https://github.com/apache/arrow-adbc/issues/1447, https://github.com/apache/arrow-adbc/issues/1358
@CurtHagenlocher you should take a look at https://github.com/apache/arrow-experiments/pull/33 It specifies how Arrow streams can be served in multipart/mixed
responses.
Describe the enhancement requested
Contribute Python client and server examples to the indirect HTTP GET examples in the
arrow-experiments
repo. This should demonstrate how to use a two-step sequence to retrieve Arrow data:Component(s)
Python