georust / netcdf

High-level netCDF bindings for Rust
Apache License 2.0
82 stars 28 forks source link

CF Time attribute #94

Closed antscloud closed 11 months ago

antscloud commented 2 years ago

Hello, thank you for the work on this project !

By any chance, is there a plan to add a CF time attribute reading/parsing to handle the datetime type ? I can't find anything on the docs nor on crates.io I think of something like Julia CFtime or Python CFtime

I believe it would be a great feature for the geophysical field

magnusuMET commented 2 years ago

This would be a good addition to this crate, a PR for this feature would be great! I have not made a comprehensive parser, but the following snippet might be of help:

/// Parses CF time + duration strings (seconds/minutes/hours since ISODATE)
// TODO: return Result instead
fn get_time_from_str(timestr: &str) -> Option<(DateTime<offset::FixedOffset>, Duration)> {
    use nom::{
        branch::alt,
        bytes::complete::{tag, take, take_till},
        character::complete::{digit1, one_of},
        combinator::{all_consuming, map, map_opt, opt},
        number::complete::double,
        sequence::{pair, separated_pair, terminated, tuple},
        IResult,
    };

    fn duration(input: &str) -> IResult<&str, Duration> {
        let till_space = take_till(|c| c == ' ');
        let dur = map_opt(till_space, |t: &str| match t {
            "days" | "day" | "d" => Some(Duration::days(1)),
            "hours" | "hour" | "h" => Some(Duration::hours(1)),
            "minutes" | "minute" | "min" => Some(Duration::minutes(1)),
            "seconds" | "second" | "sec" | "s" => Some(Duration::seconds(1)),
            _ => None,
        });
        let since = tag(" since ");

        terminated(dur, since)(input)
    }

    fn ymd_hms(input: &str) -> IResult<&str, chrono::NaiveDateTime> {
        fn u32_num(input: &str) -> IResult<&str, u32> {
            map_opt(digit1, |s: &str| s.parse::<u32>().ok())(input)
        }
        fn ymd(input: &str) -> IResult<&str, chrono::NaiveDate> {
            let i32_parser = map_opt(digit1, |s: &str| s.parse::<i32>().ok());

            map_opt(
                tuple((i32_parser, tag("-"), u32_num, tag("-"), u32_num)),
                |(y, _, m, _, d)| chrono::NaiveDate::from_ymd_opt(y, m, d),
            )(input)
        }
        fn hms(input: &str) -> IResult<&str, chrono::NaiveTime> {
            map_opt(
                tuple((u32_num, tag(":"), u32_num, tag(":"), double)),
                |(hour, _, minute, _, second)| {
                    chrono::NaiveTime::from_hms_nano_opt(
                        hour,
                        minute,
                        second.trunc() as _,
                        (second.fract() * 1_000_000_000.0) as _,
                    )
                },
            )(input)
        }

        map(tuple((ymd, tag(" "), hms)), |(ymd, _, hms)| {
            chrono::NaiveDateTime::new(ymd, hms)
        })(input)
    }
    fn timezone(input: &str) -> IResult<&str, chrono::offset::FixedOffset> {
        fn twonum(input: &str) -> IResult<&str, i32> {
            map_opt(take(2usize), |s: &str| s.parse::<i32>().ok())(input)
        }

        let quad = pair(twonum, twonum);
        let colon_sep = separated_pair(
            map_opt(digit1, |x: &str| x.parse::<i32>().ok()),
            tag(":"),
            twonum,
        );

        let tz = map(
            tuple((tag(" "), opt(one_of("+-")), alt((quad, colon_sep)))),
            |(_, pm, (tz_h, tz_m))| {
                if let Some('-') = pm {
                    -(tz_h * 3600 + tz_m)
                } else {
                    tz_h * 3600 + tz_m
                }
            },
        );
        map(opt(tz), |tz| {
            chrono::offset::FixedOffset::east(tz.unwrap_or(0))
        })(input)
    }
    fn parse_line(
        input: &str,
    ) -> IResult<&str, (Duration, chrono::DateTime<chrono::offset::FixedOffset>)> {
        use chrono::offset::TimeZone;

        map_opt(tuple((duration, ymd_hms, timezone)), |(dur, time, tz)| {
            let tz: chrono::offset::FixedOffset = chrono::TimeZone::from_offset(&tz);
            let time = tz.from_local_datetime(&time);
            match time.single() {
                Some(x) => Some((dur, x)),
                _ => None,
            }
        })(input)
    }

    let mut parser = all_consuming(parse_line);

    parser(timestr).ok().map(|(_, x)| (x.1, x.0))
}
antscloud commented 2 years ago

Thank you for your snippet, it'll help :+1:

I am new to Rust, I'll try to write something when I have time. If I do, I'll do a PR

I was thinking,maybe it might be easier, in the first place, to add a binding to either the Python package (as written in CPython) or the C UDunits package, what do you think?

magnusuMET commented 2 years ago

Wrapping python is not trivial. Udunits might be feasible, but this is a big library and could be a pain to use compared to rollig our own parser. The following is how we could implement the parser if the iso8601 crate could expose the nom parser

pub enum Duration {
    Days,
    Hours,
    Minutes,
    Seconds,
}

fn duration(input: &str) -> IResult<&str, Duration> {
    let days = map(alt((tag("days"), tag("day"), tag("d"))), |_: &str| {
        Duration::Days
    });
    let hours = map(alt((tag("hours"), tag("hour"), tag("h"))), |_: &str| {
        Duration::Hours
    });
    let minutes = map(
        alt((tag("minutes"), tag("minute"), tag("min"))),
        |_: &str| Duration::Minutes,
    );
    let seconds = map(
        alt((tag("seconds"), tag("second"), tag("sec"), tag("s"))),
        |_: &str| Duration::Seconds,
    );

    alt((days, hours, minutes, seconds))(input)
}

fn cf_parser(
        input: &str,
    ) -> IResult<&str, (Duration, DateTime)> {

    let since = tuple((space1, tag("since"), space1));

    all_consuming(separated_pair(duration, since, iso8601::datetime))(input)
}
magnusuMET commented 2 years ago

Comparing times is however the hard part. This depends on calendars and might be quite a lot of complexity. Not sure how much we need to leave to the user there

antscloud commented 2 years ago

Your snippets work great for parsing !

The major difficulty will be the handling of the different calendars (in addition to comparing them)

Since both chrono and time crate use proleptic gregorian calendar, it seems that we can't use them to handle the calendars. For example with the all_leap calendar, we can't define the 29 of february 2022 otherwise the functions panic.

Even if we found a workaroud, one still need to reimplement some traits like the Add trait for this specific case

Maybe something like this :


pub struct Date {
    year: u32,
    month: i8,
    day: i8,
}

pub struct Time {
    hour: i8,
    minute: i8,
    second: i8,
}

pub struct DateTime {
    date: Date,
    time: Time,
    offset: time::Duration,
}
pub struct DatetimeNoLeap {
    datetime: DateTime,
}

impl Add for DatetimeNoLeap {
    fn add(&self, other: DatetimeNoLeap) -> DatetimeNoLeap {
        // Implementation
    }
}
pub struct Datetime360Days {
    datetime: DateTime,
}
pub struct DatetimeJulian {
    datetime: DateTime,
}
pub struct CFDatetime {
    from: DateTime,
    duration: time::Duration,
}
magnusuMET commented 2 years ago

We will have to define the calendars ourselves with the correct impl for Add<Duration> and friends. This might be a lot of work, is there a MWP we could strive for?

antscloud commented 2 years ago

Definitely feasible, but not easy I'm sorry, a MWP ?

Le lun. 18 juil. 2022, 13:34, Magnus Ulimoen @.***> a écrit :

We will have to define the calendars ourselves with the correct impl for Add and friends. This might be a lot of work, is there a MWP we could strive for?

— Reply to this email directly, view it on GitHub https://github.com/georust/netcdf/issues/94#issuecomment-1187156575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZ3GA26FJJVBQZLK7B3R5DVUU6ODANCNFSM5XVNXQUA . You are receiving this because you authored the thread.Message ID: @.***>

magnusuMET commented 2 years ago

Sorry, MWP should have been MVP, minimum viable product. Would be great to see how the user could interpret a time array in a file using CF-conventions to get the correct times

antscloud commented 2 years ago

Thank you :+1:

I started to try the snippet with the parser and i just came with a working little piece of code to convert an array of int to datetimes. I wrote this with the julia api in mind, i.e. with high levels api functions decode_cftime and encode_cf_time. There is some code that are unused because i realized that this was not possible for chrono nor time to handle other calendars

use chrono::*;
use nom::{
    branch::alt,
    bytes::complete::{tag, take, take_till},
    character::complete::{digit1, one_of},
    combinator::{all_consuming, map, map_opt, opt},
    number::complete::double,
    sequence::{pair, separated_pair, terminated, tuple},
    IResult,
};

fn get_time_from_str(timestr: &str) -> Option<(DateTime<chrono::FixedOffset>, Duration)> {
    fn duration(input: &str) -> IResult<&str, Duration> {
        let till_space = take_till(|c| c == ' ');
        let dur = map_opt(till_space, |t: &str| match t {
            "days" | "day" | "d" => Some(Duration::days(1)),
            "hours" | "hour" | "h" => Some(Duration::hours(1)),
            "minutes" | "minute" | "min" => Some(Duration::minutes(1)),
            "seconds" | "second" | "sec" | "s" => Some(Duration::seconds(1)),
            _ => None,
        });
        let since = tag(" since ");

        terminated(dur, since)(input)
    }

    fn ymd_hms(input: &str) -> IResult<&str, chrono::NaiveDateTime> {
        fn u32_num(input: &str) -> IResult<&str, u32> {
            map_opt(digit1, |s: &str| s.parse::<u32>().ok())(input)
        }
        fn ymd(input: &str) -> IResult<&str, chrono::NaiveDate> {
            let i32_parser = map_opt(digit1, |s: &str| s.parse::<i32>().ok());

            let result = map_opt(
                tuple((i32_parser, tag("-"), u32_num, tag("-"), u32_num)),
                |(y, _, m, _, d)| chrono::NaiveDate::from_ymd_opt(y, m, d),
            )(input);

            result
        }
        fn hms(input: &str) -> IResult<&str, chrono::NaiveTime> {
            let result = map_opt(
                tuple((u32_num, tag(":"), u32_num, tag(":"), double)),
                |(hour, _, minute, _, second)| {
                    chrono::NaiveTime::from_hms_nano_opt(
                        hour,
                        minute,
                        second.trunc() as _,
                        (second.fract() * 1_000_000_000.0) as _,
                    )
                },
            )(input);
            result
        }

        map(tuple((ymd, tag(" "), hms)), |(ymd, _, hms)| {
            chrono::NaiveDateTime::new(ymd, hms)
        })(input)
    }
    fn timezone(input: &str) -> IResult<&str, chrono::FixedOffset> {
        fn twonum(input: &str) -> IResult<&str, i32> {
            map_opt(take(2usize), |s: &str| s.parse::<i32>().ok())(input)
        }

        let quad = pair(twonum, twonum);
        let colon_sep = separated_pair(
            map_opt(digit1, |x: &str| x.parse::<i32>().ok()),
            tag(":"),
            twonum,
        );

        let tz = map(
            tuple((tag(" "), opt(one_of("+-")), alt((quad, colon_sep)))),
            |(_, pm, (tz_h, tz_m))| {
                if let Some('-') = pm {
                    -(tz_h * 3600 + tz_m)
                } else {
                    tz_h * 3600 + tz_m
                }
            },
        );
        map(opt(tz), |tz| chrono::FixedOffset::east(tz.unwrap_or(0)))(input)
    }
    fn parse_line(input: &str) -> IResult<&str, (Duration, chrono::DateTime<chrono::FixedOffset>)> {
        map_opt(tuple((duration, ymd_hms, timezone)), |(dur, time, tz)| {
            let tz: chrono::FixedOffset = chrono::TimeZone::from_offset(&tz);
            let time = tz.from_local_datetime(&time);
            match time.single() {
                Some(x) => Some((dur, x)),
                _ => None,
            }
        })(input)
    }

    let mut parser = all_consuming(parse_line);
    parser(timestr).ok().map(|(_, x)| (x.1, x.0))
}

fn dispatch_calendar(calendar: &str) -> Calendars {
    match calendar {
        "standard" | "gregorian" => Calendars::CalendarStandard,
        "proleptic_gregorian" => Calendars::CalendarProlepticGregorian,
        "360_day" => Calendars::Calendar360Day,
        "julian" => Calendars::CalendarJulian,
        "no_leap" => Calendars::CalendarNoLeap,
        "365_day" => Calendars::Calendar365Day,
        "all_leap" => Calendars::CalendarAllLeap,
        "366_day" => Calendars::Calendar366Day,
        _ => Calendars::CalendarStandard,
    }
}
enum Calendars {
    CalendarStandard,
    CalendarProlepticGregorian,
    Calendar360Day,
    CalendarJulian,
    CalendarNoLeap,
    Calendar365Day,
    CalendarAllLeap,
    Calendar366Day,
}

struct CFUnitsDateTime {
    from: DateTime<chrono::FixedOffset>,
    duration: chrono::Duration,
    calendar: Calendars,
}

trait CFDateTimeEncoder {
    fn encode(self: &Self, datetime: CFUnitsDateTime);
}

trait CFDateTimeDecoder {
    fn decode(self: &Self, value: i32) -> DateTime<FixedOffset>;
}

impl CFDateTimeDecoder for CFUnitsDateTime {
    fn decode(self: &Self, value: i32) -> DateTime<FixedOffset> {
        let ms: f64 = (self.duration.num_milliseconds() as f64) * (value as f64);
        let datetime = self.from + chrono::Duration::milliseconds(ms as i64);
        datetime
    }
}

fn decode_cftime(
    input_str: &str,
    time_values: Vec<i32>,
    calendar: Calendars,
) -> Vec<DateTime<FixedOffset>> {
    let (date, dur) = get_time_from_str(input_str).unwrap();
    let cfdatetime = CFUnitsDateTime {
        from: date,
        duration: dur,
        calendar: Calendars::CalendarStandard,
    };

    time_values
        .into_iter()
        .map(|v| cfdatetime.decode(v))
        .collect()
}

fn main() {
    decode_cftime(
        "days since 1900-01-01 00:00:00",
        (1..10_000).collect(),
        Calendars::CalendarStandard,
    );
}

I guess we need to implement all the datetime structures for the different calendars (DatetimeNoLeap, Datetime360Day and so on) with the basic traits (Add, Sub ..).

We may need three traits on top of the datetimes structures :

What do you think ?

magnusuMET commented 2 years ago

I am starting to like this design. Maybe we can start creating a PR for this and iterate some designs? Could make it easier to do a code review. I refined the snippet I wrote above, adding tests and removing chrono in the process:

netcdf/src/cftime.rs ```rust //! Time handling according to CF conventions #![allow(missing_docs)] use nom::{ branch::alt, bytes::complete::{tag, take, take_till}, character::complete::{digit1, one_of, space1, u32, u8, i32, i8}, combinator::{all_consuming, map, map_opt, opt, peek, rest, value}, number::complete::double, sequence::{pair, preceded, separated_pair, terminated, tuple}, IResult, }; /// Parsing error #[derive(Debug)] pub struct ParseError(String); impl std::fmt::Display for ParseError { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { write!(f, "{}", self.0) } } impl std::error::Error for ParseError {} /// Base duration between time points #[allow(missing_docs)] #[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Copy, Clone)] pub enum Duration { Years, Months, Days, Hours, Minutes, Seconds, Milliseconds, Microseconds, } #[derive(Debug, Clone, Copy, Default)] pub struct Date { pub year: u32, pub month: u32, pub day: u32, } #[derive(Debug, Default, Clone, Copy)] pub struct Time { pub hour: i32, pub minute: u32, pub second: u32, pub millisecond: u32, pub microsecond: u32, } #[derive(Debug, Copy, Clone, Default)] pub struct DateTime { pub date: Date, pub time: Time, pub tz: Tz, } fn duration(input: &str) -> IResult<&str, Duration> { #[rustfmt::skip] let years = value( Duration::Years, alt(( tag("common_years"), tag("common_year") )), ); #[rustfmt::skip] let months = value( Duration::Months, alt(( tag("months"), tag("month") )) ); #[rustfmt::skip] let days = value( Duration::Days, alt(( tag("days"), tag("day"), tag("d") )) ); #[rustfmt::skip] let hours = value( Duration::Hours, alt(( tag("hours"), tag("hour"), tag("hrs"), tag("hr"), tag("h") )), ); #[rustfmt::skip] let minutes = value( Duration::Minutes, alt(( tag("minutes"), tag("minute"), tag("mins"), tag("min") )), ); let seconds = value( Duration::Seconds, alt(( tag("seconds"), tag("second"), tag("secs"), tag("sec"), tag("s"), )), ); let milliseconds = value( Duration::Milliseconds, alt(( tag("milliseconds"), tag("millisecond"), tag("millisecs"), tag("millisec"), tag("msecs"), tag("msec"), tag("ms"), )), ); let microseconds = value( Duration::Microseconds, alt(( tag("microseconds"), tag("microsecond"), tag("microsecs"), tag("microsec"), )), ); alt(( years, months, days, hours, minutes, seconds, milliseconds, microseconds, ))(input) } fn date(input: &str) -> IResult<&str, Date> { let ymd = map( tuple((u32, tag("-"), u32, tag("-"), u32)), |(year, _, month, _, day)| Date { year, month, day }, ); alt((ymd,))(input) } fn time(input: &str) -> IResult<&str, Time> { let hms = map( tuple((i32, tag(":"), u32, tag(":"), double)), |(hour, _, minute, _, second)| { let (second, rest) = (second.trunc(), second.fract()); let millisecond = rest * 1000.0; let (millisecond, rest) = (millisecond.trunc(), millisecond.fract()); let microsecond = rest * 1000.0; Time { hour, minute, second: second as _, millisecond: millisecond as _, microsecond: microsecond as _, } }, ); let hm = map( separated_pair(i32, tag(":"), u32), |(hour, minute)| Time { hour, minute, ..Time::default() }, ); alt((hms, hm))(input) } fn timezone(input: &str) -> IResult<&str, Tz> { println!("{input}"); let hm = map( preceded(opt(tag("+")), separated_pair(i8, tag(":"), u8)), |(hour, minute)| Tz { hour: hour, minute: minute , }, ); let z = value(Tz::default(), tag("Z")); let utc = value(Tz::default(), tag("UTC")); alt((hm, z, utc))(input) } fn datetime(input: &str) -> IResult<&str, DateTime> { fn space1_or_t(input: &str) -> IResult<&str, ()> { alt((value((), space1), value((), tag("T"))))(input) } let tz = map( separated_pair(separated_pair(date, space1_or_t, time), space1, timezone), |((date, time), tz)| DateTime { date, time, tz }, ); let no_tz = map(separated_pair(date, space1_or_t, time), |(date, time)| { DateTime { date, time, ..DateTime::default() } }); let date_with_tz = map(separated_pair(date, space1, timezone), |(date, tz)| { DateTime { date, tz, ..DateTime::default() } }); let date_time_no_space_tz = map( separated_pair( separated_pair(date, space1_or_t, time), peek(one_of("+-Z")), timezone, ), |((date, time), tz)| DateTime { date, time, tz }, ); let only_date = map(date, |date| DateTime { date, ..DateTime::default() }); alt(( tz, date_time_no_space_tz, no_tz, date_with_tz, only_date, ))(input) } #[derive(Copy, Clone, Debug, PartialEq, Eq, Default)] pub struct Tz { hour: i8, minute: u8, } /// Parse a CF compatible string into two components pub fn cf_parser(input: &str) -> Result<(Duration, DateTime), ParseError> { let since = tuple((space1, tag("since"), space1)); all_consuming(separated_pair(duration, since, datetime))(input) .map(|(_, o)| o) .map_err(|e| ParseError(format!("{}", e))) } #[cfg(test)] mod test { use super::*; fn parse(input: &str) { println!("{:?}", cf_parser(input).unwrap()) } #[test] fn cf_conventions_document() { parse("days since 1990-1-1 0:0:0"); parse("seconds since 1992-10-8 15:15:42.5 -6:00"); parse("days since 1-7-15 0:0:0"); parse("days since 1-1-1 0:0:0"); } #[test] fn cftime_py_setup() { parse("hours since 0001-01-01 00:00:00"); parse("hours since 0001-01-01 00:00:00"); parse("hours since 0001-01-01 00:00:00 -06:00"); parse("seconds since 0001-01-01 00:00:00"); parse("days since 1600-02-28 00:00:00"); parse("days since 1600-02-29 00:00:00"); parse("days since 1600-02-30 00:00:00"); parse("hours since 1000-01-01 00:00:00"); parse("seconds since 1970-01-01T00:00:00Z"); parse("days since 850-01-01 00:00:00"); parse("hours since 0001-01-01 00:00:00"); parse("days since 1600-02-28 00:00:00"); } #[test] fn cftime_py_tz_naive() { let d_check = ["1582-10-15 00:00:00", "1582-10-15 12:00:00"]; for d in d_check { parse(&format!("day since {}", d)); } } #[test] fn cftime_py() { parse("days since 1000-01-01"); parse("seconds since 1970-01-01T00:00:00Z"); parse("hours since 2013-12-12T12:00:00"); parse("hours since 1682-10-15 -07:00"); parse("hours since 1682-10-15 -07:00:12"); parse("hours since 1682-10-15T-07:00:12"); parse("hours since 1682-10-15 -07:00 UTC"); parse("hours since 2000-01-01 22:30+04:00"); parse("hours since 2000-01-01 11:30-07:00"); parse("hours since 2000-01-01 15:00-03:30"); } #[test] fn etc() { parse("seconds since 1992-10-8 15:15:42.5Z"); parse("seconds since 1992-10-8 15:15:42Z"); } } ```
antscloud commented 2 years ago

Wow such a snippet in no time :exploding_head: I can make a PR with this code if you want. What about create a library (subfolder) with cargo new --lib cftime in the root in case the code becomes too big and need a separated crate ?

magnusuMET commented 2 years ago

I had it laying around, but never got to doing anything with it :D Feel free to create a new crate and make a PR. This makes sense as a separate crate as the utilities are orthogonal to reading and writing netCDF

magnusuMET commented 11 months ago

An implementation of cftime can be found in this crate: https://github.com/antscloud/cftime-rs