apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

Avoid `from_num_days_from_ce_opt` calls in `timestamp_s_to_datetime` if we don't need #6746

Closed jayzhan211 closed 1 day ago

jayzhan211 commented 3 days ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Describe the solution you'd like

In Arrow-rs, extracting the minute from a timestamp involves converting the timestamp, such as 1599563412, into a NaiveDateTime using the timestamp_s_to_datetime function. This results in a value like 2020-09-08T12:10:12.123456780. The minute() method is then called on the NaiveDateTime to retrieve the minute component, such as 10.

However, in chrono's implementation, even when only the minute is required using NaiveTime::from_num_seconds_from_midnight_opt, the function NaiveTime::from_num_days_from_ce_opt is still called. Ideally, this additional step should be avoided to improve efficiency.

    #[inline]
    #[must_use]
    pub const fn from_timestamp(secs: i64, nsecs: u32) -> Option<Self> {
        let days = secs.div_euclid(86_400) + UNIX_EPOCH_DAY;
        let secs = secs.rem_euclid(86_400);
        if days < i32::MIN as i64 || days > i32::MAX as i64 {
            return None;
        }
        let date = try_opt!(NaiveDate::from_num_days_from_ce_opt(days as i32));
        let time = try_opt!(NaiveTime::from_num_seconds_from_midnight_opt(secs as u32, nsecs));
        Some(date.and_time(time).and_utc())
    }

I propose we upstream chrono crate and find a way to get date and time separately given secs: i64, nsecs: u32. Then we call corresponding smaller function based on what we need to minimum the cost.

Describe alternatives you've considered

Additional context

This is the flamegraph for clickbench Q18 in datafusion, we can see the from_num_days_from_ce_opt spent much time for date_part function. But the query care only about minute so the compute of from_num_days_from_ce_opt is useless.

Screenshot 2024-11-18 at 5 40 45 PM