apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.88k stars 3.38k forks source link

[Docs] Document conventions for sending and receiving Arrow data over HTTP APIs #40465

Open ianmcook opened 3 months ago

ianmcook commented 3 months ago

Describe the enhancement requested

The Arrow developer community intends to publish a set of conventions in the Arrow docs for how to send and receive Arrow-format data over HTTP APIs. There is a related discussion on the Arrow developer mailing list at https://lists.apache.org/thread/886cnx6ytjst3smmytz4r4ddcbv95191.

Tasks

This issue is an umbrella for tasks that are a part of this effort.


Simple HTTP GET client and server examples

:file_folder: arrow-experiments/tree/main/http/get_simple


HTTP GET client and server examples demonstrating range requests

:file_folder: arrow-experiments/tree/main/http/get_range


Indirect response HTTP GET client and server examples

:file_folder: arrow-experiments/tree/main/http/get_indirect


Multipart/mixed response HTTP GET client and server examples

:file_folder: arrow-experiments/tree/main/http/get_multipart


HTTP GET examples to test different compression options

:file_folder: arrow-experiments/tree/main/http/get_compressed


Simple HTTP PUT / POST client and server examples

:file_folder: arrow-experiments/tree/main/http/post_simple


Multipart/form-data request HTTP PUT / POST client and server examples

:file_folder: arrow-experiments/tree/main/http/post_multipart


General issues and questions


Component(s)

Documentation

felipecrv commented 3 months ago

Is this the main issue where I can track your HTTP+Arrow initiative @ianmcook?

ianmcook commented 3 months ago

Is this the main issue where I can track your HTTP+Arrow initiative @ianmcook?

Yes

CurtHagenlocher commented 3 months ago

Any thoughts on the "best" canonical way to return multiple record batches? Would that also be multipart/mixed or would it be better to avoid the delimiter problem and e.g. use an alternate Content-Type indicating that the response contains multiple streams?

ianmcook commented 3 months ago

Any thoughts on the "best" canonical way to return multiple record batches? Would that also be multipart/mixed or would it be better to avoid the delimiter problem and e.g. use an alternate Content-Type indicating that the response contains multiple streams?

@CurtHagenlocher Let's discuss this at https://github.com/apache/arrow/issues/40596#issuecomment-2035198898