Closed marvinlanhenke closed 2 weeks ago
i updated the title of this ticket to reflect the end behavior I think it is addressing
Specifically, I think the gap identified by @marvinlanhenke above is that trying to write an IntervalMonthDayNano
array to parquet via https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html will not work
TO proceed with this ticket the first thing would probably be to make a small test case to verify that IntervalMonthDayNanoArray
can not be written to parquet
I don't believe the parquet specification allows for supporting nanosecond intervals - https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval
I am therefore not sure this ticket is actionable...
I wonder what should the guidance be for people who have IntervalMonthDayNano
arrays and want to write the data to Parquet 🤔
Is it "cast the data to an interval type that is supported (IntervalMonthDay
)? If so I can add a note to the docs
Another potential option would be "write this type in as a FIXED_LENGTH_BYTE_ARRAY
or something (with no parquet logical type) -- which would permit round tripping data written by parquet-rs back to ArrayRef
but would not be readable by any other implementation
I dug around in arrow and found some suggestions jave doesn't support it either https://github.com/apache/arrow/blob/65974672a356f34889ed7b9bfb8b76230c27c7ee/java/dataset/src/test/java/org/apache/arrow/dataset/TestAllTypes.java#L94-L96
cast the data to an interval type that is supported
Documenting this I think this is the least potentially controversial path forward
Proposed documentation update: https://github.com/apache/arrow-rs/pull/5875
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working on the support for converting parquet statistics into ArrayRefs in DataFusion (see apache/datafusion#10453). I noticed that currently the ColumnWriter does not support writing
IntervalUnit::MonthDayNano
.This might be the location: https://github.com/apache/arrow-rs/blob/fa8d3502388d7cfac724f7b9fae92abc3a716b6f/parquet/src/arrow/arrow_writer/mod.rs#L854-L874
Describe the solution you'd like
Support for writing
IntervalUnit::MonthDayNano
in the ColumnWriter.Describe alternatives you've considered
Additional context
Related to: apache/arrow-rs#5847