Currently, the physical layouts of the Arrow Columnar Format specification are explained in https://arrow.apache.org/docs/dev/format/Columnar.html. But for how those layouts are used in practice for the different data types, and how the different data types' data and parameters should be interpreted, we refer to Schema.fbs.
This ensures there is currently one source of truth for this information, but this also has a bunch of downsides that it is "hidden" in that file:
The Schema.fbs file is actually for IPC serialization, so it contains some content that is not relevant for just the in-memory columnar format.
For fully understanding the format spec, you need to read both the docs about the layouts as the fbs file for the data types, while it would be easier to understand and follow to have that content together in a single document, instead of split into two distinct places.
Referring to a fbs file in the repo just to find prose documentation is not really a pleasant reader experience (e.g. it would render better in the docs, we can use links, etc).
Therefore, I would propose to move the bulk of the explanations about the different data types and parameters to Columnar.rst, to have a cleaner separation of what is the core columnar format, and what is specific about the IPC spec.
Then the question is how to deal with the duplication with the fbs file: I think we don't want two places to keep in sync, but would it be fine to cut down the content in the fbs file largely?
To avoid that the Columnar.rst pages becomes to unwieldy long, we should maybe at the same time separate the IPC specification into its own file: https://github.com/apache/arrow/issues/41671
Currently, the physical layouts of the Arrow Columnar Format specification are explained in https://arrow.apache.org/docs/dev/format/Columnar.html. But for how those layouts are used in practice for the different data types, and how the different data types' data and parameters should be interpreted, we refer to Schema.fbs.
This ensures there is currently one source of truth for this information, but this also has a bunch of downsides that it is "hidden" in that file:
Schema.fbs
file is actually for IPC serialization, so it contains some content that is not relevant for just the in-memory columnar format.Therefore, I would propose to move the bulk of the explanations about the different data types and parameters to Columnar.rst, to have a cleaner separation of what is the core columnar format, and what is specific about the IPC spec.
Then the question is how to deal with the duplication with the fbs file: I think we don't want two places to keep in sync, but would it be fine to cut down the content in the fbs file largely?