apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.58k stars 3.54k forks source link

[Format][Docs] Move data type (parameter) descriptions from Schema.fbs to Columnar.rst format docs #42011

Open jorisvandenbossche opened 5 months ago

jorisvandenbossche commented 5 months ago

Currently, the physical layouts of the Arrow Columnar Format specification are explained in https://arrow.apache.org/docs/dev/format/Columnar.html. But for how those layouts are used in practice for the different data types, and how the different data types' data and parameters should be interpreted, we refer to Schema.fbs.

This ensures there is currently one source of truth for this information, but this also has a bunch of downsides that it is "hidden" in that file:

Therefore, I would propose to move the bulk of the explanations about the different data types and parameters to Columnar.rst, to have a cleaner separation of what is the core columnar format, and what is specific about the IPC spec.

Then the question is how to deal with the duplication with the fbs file: I think we don't want two places to keep in sync, but would it be fine to cut down the content in the fbs file largely?

jorisvandenbossche commented 5 months ago

To avoid that the Columnar.rst pages becomes to unwieldy long, we should maybe at the same time separate the IPC specification into its own file: https://github.com/apache/arrow/issues/41671