apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.33k stars 684 forks source link

Improve the display of interval types #5914

Open alamb opened 2 weeks ago

alamb commented 2 weeks ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Today if an interval like -3 MONTH is parsed into an IntervalArray and then displayed using https://docs.rs/arrow/latest/arrow/util/display/fn.array_value_to_string.html the output is correct but overly verbose:

0 YEARS -3 MONS 0 DAYS 0 HOURS 0 MINS 0.000000000 SECS

Describe the solution you'd like

It would be nice if the displayed version was closer to the original input and did not print out fields with 0 values explicitly. For the example above, it would be great to look more like

-3 MONS

Describe alternatives you've considered

@goldmedal implemented some version of this code as part of a DataFusion PR here https://github.com/apache/datafusion/pull/10956/commits/9ae676ccdb0866ba302bace4fdb226a5b2ec3b0e that might be a reasonable implementation to start from:

Any real representation should avoid allocating vec![] and strings (via format!)

            ScalarValue::IntervalMonthDayNano(Some(i)) => {
                let mut s = vec![];
                if i.months != 0 {
                    s.push(format!("{} MONTH", i.months));
                }
                if i.days != 0 {
                    s.push(format!("{} DAY", i.days));
                }
                if i.nanoseconds != 0 {
                    s.push(Self::process_interval_nanosecond(i.nanoseconds));
                }

                let interval = Interval {
                    value: Box::new(ast::Expr::Value(SingleQuotedString(s.join(" ")))),
                    leading_field: None,
                    leading_precision: None,
                    last_field: None,
                    fractional_seconds_precision: None,
                };
                Ok(ast::Expr::Interval(interval))

And

        let mut s = vec![];
        let hour = nano / 3_600_000_000_000;
        let minute = nano / 60_000_000_000 % 60;
        let second = nano / 1_000_000_000 % 60;
        let millisecond = nano / 1_000_000 % 1_000;
        let microsecond = nano / 1_000 % 1_000;
        let nanosecond = nano % 1_000;
        if hour != 0 {
            s.push(format!("{} HOUR", hour));
        }
        if minute != 0 {
            s.push(format!("{} MINUTE", minute));
        }
        if second != 0 {
            s.push(format!("{} SECOND", second));
        }
        if millisecond != 0 {
            s.push(format!("{} MILLISECOND", millisecond));
        }
        if microsecond != 0 {
            s.push(format!("{} MICROSECOND", microsecond));
        }
        if nanosecond != 0 {
            s.push(format!("{} NANOSECOND", nanosecond));
        }
        s.join(" ")

Additional context

Rachelint commented 4 days ago

Have finished the basic codes. Adding some tests and fixing compile...