apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.36k stars 3.49k forks source link

[C++][Parquet] `parquet::arrow::FileWriter` does not propagate schema-level metadata when `ArrowWriterProperties::store_schema` is false #41766

Open TheNeuralBit opened 4 months ago

TheNeuralBit commented 4 months ago

Describe the bug, including details regarding any error messages, version, and platform.

When store_schema is true the FileWriter first copies any existing metadata before storing the serialized schema: https://github.com/apache/arrow/blob/8169d6e719453acd0e7ca1b6f784d800cca4f113/cpp/src/parquet/arrow/writer.cc#L537-L542

But when store_schema is false, the FileWriter just returns an empty metadata, and custom metadata is not copied: https://github.com/apache/arrow/blob/8169d6e719453acd0e7ca1b6f784d800cca4f113/cpp/src/parquet/arrow/writer.cc#L531-L534

Could someone confirm if this is intentional or not? It looks like an oversight to me and I have a patch ready to address it.

Component(s)

Parquet

jorisvandenbossche commented 4 months ago

See also related discussion in https://github.com/apache/arrow/issues/31723 (it's more specifically about the reading side, but it also notices this strange inconsistency of not writing the metadata when disabling to write ARROW:schema)