ArtemGr / gstuff.rs

Small macro and trinkets that make my life easier.
MIT License
4 stars 5 forks source link

implement fast `Local` to ISO 8601 shorthand and back to `Local` helpers #3

Closed ArtemGr closed 3 months ago

ArtemGr commented 1 year ago

We lack a fast Local to ISO 8601 shorthand and back conversions. cf. iso8601z workaround:

https://github.com/ArtemGr/gstuff.rs/blob/e0f2b008402a3209ca269916059451c04dce249a/gstuff.rs#L357-L366

cf. https://stackoverflow.com/questions/33114386/datetime-iso-8601-without-timezone-component

p.s. Why? To improve serialized readability without sacrificing speed.

In Python:

import datetime
dt = datetime.datetime.fromisoformat ('2022-02-02T02')
dt.isoformat()[:13]

Bountysource

mystixxx commented 1 year ago

Here is an example of how you could implement fast Local to ISO 8601 shorthand and back to Local helpers

use chrono::prelude::*;

#[cfg(feature = "fomat-macros")]
#[macro_export]
macro_rules! iso8601z {
    ($date_or_time: expr) => {{
        let sufff = ($date_or_time.len() as i32 - 10).max(0) as usize;
        ifomat!(($date_or_time)(&"T00:00:00Z"[sufff..]))
    }};
}

fn local_to_iso8601_shorthand(local: DateTime<Local>) -> String {
    local.format("%Y-%m-%dT%H").to_string()
}

fn iso8601_shorthand_to_local(iso8601_shorthand: &str) -> DateTime<Local> {
    DateTime::parse_from_str(iso8601_shorthand, "%Y-%m-%dT%H").unwrap()
}

fn iso8601_shorthand_to_iso8601(iso8601_shorthand: &str) -> String {
    iso8601z!(iso8601_shorthand).to_string()
}

fn main() {
    // Convert a local date and time to ISO 8601 shorthand
    let local_datetime = Local::now();
    let iso8601_shorthand = local_to_iso8601_shorthand(local_datetime);
    println!("{}", iso8601_shorthand);  // prints a string in the ISO 8601 shorthand format, e.g. "2022-12-26T10"

    // Convert an ISO 8601 shorthand date or time string to a local date and time
    let iso8601_shorthand = "2022-12-26T10";
    let local_datetime = iso8601_shorthand_to_local(iso8601_shorthand);
    println!("{}", local_datetime);  // prints a DateTime object with the local date and time corresponding to the ISO 8601 shorthand date or time string

    // Extend an ISO 8601 shorthand date or time string into a full ISO 8601 timestamp
    let iso8601_timestamp = iso8601_shorthand_to_iso8601(iso8601_shorthand);
    println!("{}", iso8601_timestamp);  // prints a string in the full ISO 8601 timestamp format, e.g. "2022-12-26T10:00:00Z"
}
ArtemGr commented 1 year ago

I'm using a different piece of code already

macro_rules! iso8601toL {($short: expr) => {
  Local.from_local_datetime (&(DateTime::parse_from_rfc3339 (&iso8601z! ($short))?) .naive_utc()) .earliest()?}}

but it is by no way fast. Chrono parsing is slow to begin with (cf. https://crates.io/crates/speedate), and there's a lot of fuss around timezone manipulation, in this case completely unnecessary. Specialized parsing of 2022-12-01T00:00:00 in particular can be implemented in a way so as to be comparable with storing the time as an integer. To improve serialized readability without sacrificing speed.

Specifically, I expect the shorthand parser to convert 2022, 12, 01, 00, 00 and 00 into respective integers, while checking for the presence of "-" and "T" separators, with hour, minute and second places being optional.

ISO 8601 shorthand can be seen as seven two-digit numbers (with year being made of two numbers). Would be fun to implement a specialized integer parser that would handle that case (of parsing "22" to 22) - without the usual loops!

Alternatively, one can collapse "2022-12-01T00:00:00" into "20221201000000" on the stack (with gstuff gstring, SmallVec, InlinableString or just a custom array) and use a single integer parse on it, moving the bits into respective Local fields afterwards.

Stringification could simply use the print! or manual padding.

c0d3x27 commented 1 year ago

This macro looks like it is intended to take in a string in the ISO 8601 shorthand format (e.g. "2022-12-12T12") and convert it to the full ISO 8601 timestamp format (e.g. "2022-12-12T12:00:00Z").

Here's how the macro could be implemented:

#[cfg(feature = "format-macros")]
#[macro_export]
macro_rules! iso8601z {
    ($date_or_time: expr) => {{
        let suffix = if $date_or_time.len() > 10 { ":00:00Z" } else { "T00:00:00Z" };
        format!("{}{}", $date_or_time, suffix)
    }};
}

This macro uses the format! macro to build the full timestamp string by concatenating the input string with either "T00:00:00Z" (if the input string is a date) or ":00:00Z" (if the input string is a date and time).

To use this macro, you can call it with a string literal like this:

let full_timestamp = iso8601z!("2022-12-12T12");
// full_timestamp is now "2022-12-12T12:00:00Z"

Alternatively, you can pass a variable containing the input string to the macro like this:


let input_string = "2022-12-12T12";
let full_timestamp = iso8601z!(input_string);
// full_timestamp is now "2022-12-12T12:00:00Z"
ArtemGr commented 1 year ago

The macro is a workaround for the lack of Local and ISO 8601 parsing in Chrono, for sure.

mystixxx commented 1 year ago

I think it's generally not necessary to use a macro to implement the functions you described, as they can be implemented using regular Rust functions. In fact, using a macro for these functions could actually make the code more difficult to understand and maintain. For example using speedate.

ArtemGr commented 1 year ago

You know what "cf." and a "workaround" is, right?

And if you do, then are you suggesting that even the temporary workarounds should somehow be “perfect”? Do you know that “beauty is in the eye of the beholder”, that design principles work best when they are team-specific, and that you shouldn't waste time on something you're not going to need, such as a perfect workaround?

How come that, rather than contributing some fast ISO 8601 shorthand parsers, you're picking on my exploratory and temporary code? Do you consider that to be an efficient investment of your time?

csicar commented 1 year ago

I think there is room for implementing something like this in chrono itself. I did some initial tests and found it should be possible to speed up the parse step by around 10x without forgoing validation.

bench_datetime_parse_from_rfc3339
                        time:   [413.52 ns 438.71 ns 467.06 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

bench_datetime_parse_from_iso8601_fast
                        time:   [35.340 ns 37.300 ns 39.579 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Of course, this is still slower than basic string interpolation, but still a lot better. Is that what you are looking for?

ArtemGr commented 1 year ago

Yes, a 10x of parse_from_rfc3339 will do. 👍

csicar commented 1 year ago

NIce. I'll be working on getting it into a mergeable state then :)

ArtemGr commented 1 year ago

Need any help?

ArtemGr commented 3 months ago

Benchmarks in https://github.com/ArtemGr/gstuff.rs/commit/8e27d17da7254d22cfb0da215ac9692596bfdfb6 show a fundamental problem with chrono: even if we have the year, month, day, hour and minute as integers, the library offers not a fast path to make a DateTime<Local> from them. Local.with_ymd_and_hms is slower than DateTime<Local>::parse_from_rfc3339.

Linux

test time_bench::chrono_from_rfc3339   ... bench:          99 ns/iter (+/- 37)
test time_bench::chrono_from_str       ... bench:         193 ns/iter (+/- 12)
test time_bench::icsᵇ                ... bench:           4 ns/iter (+/- 0)
test time_bench::iso8601icsᵇ         ... bench:          13 ns/iter (+/- 0)
test time_bench::iso8601tol            ... bench:         401 ns/iter (+/- 19)
test time_bench::iso8601tol_macro      ... bench:         278 ns/iter (+/- 8)
test time_bench::iso8601ton            ... bench:          68 ns/iter (+/- 3)

Windows

test time_bench::chrono_from_rfc3339 ... bench:          84 ns/iter (+/- 3)
test time_bench::chrono_from_str     ... bench:         152 ns/iter (+/- 5)
test time_bench::icsᵇ              ... bench:          19 ns/iter (+/- 1)
test time_bench::iso8601icsᵇ       ... bench:          11 ns/iter (+/- 0)
test time_bench::iso8601tol          ... bench:      14,646 ns/iter (+/- 628)
test time_bench::iso8601tol_macro    ... bench:         603 ns/iter (+/- 19)
test time_bench::iso8601ton          ... bench:          57 ns/iter (+/- 15)

iso8601tol (with_ymd_and_hms + with_nanosecond) is much slower on Windows.

What we can speedup having integers is obtaining the NaiveDateTime.

Here iso8601ton (NaiveDateTime::new) is faster than chrono_from_rfc3339, a proof that we now have a faster way of parsing ISO 8601 shorthand.

p.s. https://github.com/ArtemGr/gstuff.rs/commit/edc63f515b7939579fafea0b4dfe547cda3c692e implements a shortcut which parses ISO 8601 shorthand into UNIX time in 23 nanoseconds (iso8601_ics_ms), hence allowing us to improve time readability without too much overhead.

test time_bench::chrono_from_rfc3339   ... bench:          90 ns/iter (+/- 1)
test time_bench::chrono_from_str       ... bench:         156 ns/iter (+/- 3)
test time_bench::chrono_iso8601        ... bench:         348 ns/iter (+/- 7)
test time_bench::duration              ... bench:          21 ns/iter (+/- 0)
test time_bench::iso8601_ics_ms        ... bench:          23 ns/iter (+/- 0)
test time_bench::iso8601icsᵇ         ... bench:          14 ns/iter (+/- 0)
test time_bench::iso8601tol            ... bench:         396 ns/iter (+/- 8)
test time_bench::iso8601tol_macro      ... bench:         284 ns/iter (+/- 9)
test time_bench::iso8601ton            ... bench:          67 ns/iter (+/- 1)
test time_bench::ms2iso8601ᵇ         ... bench:         173 ns/iter (+/- 2)