apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.59k stars 786 forks source link

Explicitly don't support leap seconds for Time32/Time64 arrays when parsing #5335

Open Jefffrey opened 9 months ago

Jefffrey commented 9 months ago

Describe the bug

Arrow spec states that leap seconds should not be supported:

https://github.com/apache/arrow/blob/13b22346d36b9952df5c988c9425b9e5bc4f09c4/format/Schema.fbs#L265-L274

The allowed values are between 0 (inclusive) and 86400 (=24*60*60) seconds (exclusive), adjusted for the time unit (for example, up to 86400000 exclusive for the MILLISECOND unit). This definition doesn't allow for leap seconds. Time values from measurements with leap seconds will need to be corrected when ingesting into Arrow (for example by replacing the value 86400 with 86399).

However we introduced valid parsing of leap seconds in this PR: https://github.com/apache/arrow-rs/pull/3101

To Reproduce

See this test:

https://github.com/apache/arrow-rs/blob/8fff5e4a075c20619690c967e217163cd74e0656/arrow-cast/src/parse.rs#L1759-L1760

Expected behavior

According to Arrow spec above, we should convert to 86399 value instead?

Either that, or simply update docs of Time32/Time64 datatype & parsing to indicate the current behaviour.

Additional context

An interesting thing to note is that whilst parsing into the integer representation is successful, when trying to display the value, it doesn't actually display properly anyway. See below example:

    #[test]
    fn test_pretty_format_time_32_second_invalid() {
        let array: Time32SecondArray = vec![
            Some(86_400),
        ]
        .into();
        println!("{:?}", array);

        let schema = Arc::new(Schema::new(vec![Field::new(
            "f",
            array.data_type().clone(),
            true,
        )]));
        let batch = RecordBatch::try_new(schema, vec![Arc::new(array)]).unwrap();

        let table = pretty_format_batches(&[batch]).unwrap().to_string();
        println!("{}", table);
    }

Output:

PrimitiveArray<Time32(Second)>
[
  null,
]
+---------------------------------------------------------------------------+
| f                                                                         |
+---------------------------------------------------------------------------+
| ERROR: Cast error: Failed to convert 86400 to temporal for Time32(Second) |
+---------------------------------------------------------------------------+
Jefffrey commented 9 months ago

Note I've raised https://github.com/apache/arrow-rs/issues/5336 regarding the confusing behaviour of printing null when debug outputting the array