add a new "friendly" duration format

BurntSushi commented 3 months ago

This issue is partially motivated by #60, but the bigger picture here is that the ISO 8601 duration kinda sucks. We should obviously still support it, and it will continue to be the "default" used when serializing a Span/SignedDuration, but I think we can do better than that. The wide use of crates like humantime demonstrate this.

I hunted around for a specification or seeming standard that Jiff could use, but one just doesn't exist. However, there's definitely a "common sense" sort of format that has organically developed in at least the Go and Rust ecosystems (and probably others). For example, the following are all valid time.Duration values from Go's standard library time package:

300ms
-1.5h
2h45m

humantime almost supports these, but since it parses into a std::time::Duration (which is unsigned), negative durations aren't supported. And it doesn't support fractional units:

fn main() -> anyhow::Result<()> {
    let durations = ["300ms", "15h", "2h45m"];
    for d in durations {
        let parsed = humantime::parse_duration(d)?;
        println!("{parsed:?}");
    }
    Ok(())
}

But humantime is a bit more flexible than what Go supports. For example, this program runs fine:

fn main() -> anyhow::Result<()> {
    let durations = ["300 millis", "15 hours", "2 hours 45 mins"];
    for d in durations {
        let parsed = humantime::parse_duration(d)?;
        println!("{parsed:?}");
    }
    Ok(())
}

What are the advantages of the above kinds of durations? I think there are two:

I think they are much easier to read.
They lend themselves to serializing each unit individually. Since we're building out own format, we don't have to require fractional seconds. For example, 1 second 1000ms should be perfectly valid. But you can't express that in ISO 8601 durations.

So here's my first stab at a grammar for this new duration format:

format =
    format-hms
    | format-designator

format-hms =
    sign? hours ':' minutes ':' seconds fractional?

format-designator =
    sign? format-designator-units
    | format-designator-units direction?
format-designator-units =
    years
    | months
    | weeks
    | days
    | hours
    | minutes
    | seconds
    | milliseconds
    | microseconds
    | nanoseconds

# This dance below is basically to ensure two things:
# First, that at least one unit appears. That is, that
# we don't accept the empty string. Secondly, when a
# fractional component appears in a time value, we don't
# allow any subsequent units to appear.
years =
    unit-value unit-years ws* months
    | unit-value unit-years ws* weeks
    | unit-value unit-years ws* days
    | unit-value unit-years ws* hours
    | unit-value unit-years ws* minutes
    | unit-value unit-years ws* seconds
    | unit-value unit-years ws* milliseconds
    | unit-value unit-years ws* microseconds
    | unit-value unit-years ws* nanoseconds
    | unit-value unit-years
months =
    unit-value unit-months ws* weeks
    | unit-value unit-months ws* days
    | unit-value unit-months ws* hours
    | unit-value unit-months ws* minutes
    | unit-value unit-months ws* seconds
    | unit-value unit-months ws* milliseconds
    | unit-value unit-months ws* microseconds
    | unit-value unit-months ws* nanoseconds
    | unit-value unit-months
weeks =
    unit-value unit-weeks ws* days
    | unit-value unit-weeks ws* hours
    | unit-value unit-weeks ws* minutes
    | unit-value unit-weeks ws* seconds
    | unit-value unit-weeks ws* milliseconds
    | unit-value unit-weeks ws* microseconds
    | unit-value unit-weeks ws* nanoseconds
    | unit-value unit-weeks
days =
    unit-value unit-days ws* hours
    | unit-value unit-days ws* minutes
    | unit-value unit-days ws* seconds
    | unit-value unit-days ws* milliseconds
    | unit-value unit-days ws* microseconds
    | unit-value unit-days ws* nanoseconds
    | unit-value unit-days
hours =
    unit-value unit-hours ws* minutes
    | unit-value unit-hours ws* seconds
    | unit-value unit-hours ws* milliseconds
    | unit-value unit-hours ws* microseconds
    | unit-value unit-hours ws* nanoseconds
    | unit-value fractional? ws* unit-hours
minutes =
    unit-value unit-minutes ws* seconds
    | unit-value unit-minutes ws* milliseconds
    | unit-value unit-minutes ws* microseconds
    | unit-value unit-minutes ws* nanoseconds
    | unit-value fractional? ws* unit-minutes
seconds =
    unit-value unit-seconds ws* milliseconds
    | unit-value unit-seconds ws* microseconds
    | unit-value unit-seconds ws* nanoseconds
    | unit-value fractional? ws* unit-seconds
milliseconds =
    unit-value unit-milliseconds ws* microseconds
    | unit-value unit-milliseconds ws* nanoseconds
    | unit-value fractional? ws* unit-milliseconds
microseconds =
    unit-value unit-microseconds ws* nanoseconds
    | unit-value fractional? ws* unit-microseconds
nanoseconds =
    unit-value fractional? ws* unit-nanoseconds

unit-value = [0-9]+ [ws*]
unit-years = 'years' | 'year' | 'y'
unit-months = 'months' | 'month' | 'M'
unit-weeks = 'weeks' | 'week' | 'w'
unit-days = 'days' | 'day' | 'd'
unit-hours = 'hours' | 'hour' | 'hrs' | 'hr' | 'h'
unit-minutes = 'minutes' | 'minute' | 'mins' | 'min' | 'm'
unit-seconds = 'seconds' | 'second' | 'secs' | 'sec' | 's'
unit-milliseconds =
    'milliseconds' | 'millisecond' | 'millis' | 'milli' | 'msec' | 'ms'
unit-microseconds =
    'microseconds'
    | 'microsecond'
    | 'micros'
    | 'micro'
    | 'usec'
    | 'us'
    | 'µ' (U+00B5 MICRO SIGN) 's'
unit-nanoseconds =
    'nanoseconds' | 'nanosecond' | 'nanos' | 'nano' | 'nsec' | 'ns'

fractional = decimal-separator decimal-fraction
decimal-separator = '.' | ','
decimal-fraction = [0-9]{1,9}

sign = '+' | '-'
direction = 'ago'
ws =
    U+0020 SPACE
    | U+0009 HORIZONTAL TAB
    | U+000A LINE FEED
    | U+000C FORM FEED
    | U+000D CARRIAGE RETURN

I also need help picking a name for this format. "human" is not a bad name, but the above is specifically English and doesn't take localization into account, so calling it something as broadly applicable as "human" seems a bit narrow minded. Right now, my pick is "friendly." The "friendly" duration format. I don't love it though.

fdncred commented 3 months ago

Loving this! Nushell does this type of thing and just calls them durations in the repl.

I've also written something kind of like what nushell does as an output for jiff of a date diff command that I'm writing for nushell.

❯ '2019-05-10T09:59:12-07:00[-07:00]' | dt diff (dt now)
P5y3m12dT26m1.5374272s
5yrs 3mths 1wks 5days 26mins 1secs 537ms 427µs 200ns

The first output line is just for debugging.

Personally, I'd like to see jiff have more synonyms for datetime. Not saying this is perfect but this is where I've landed so far. (maybe I should use some upper case letters too to differentiate things like months and minutes)

pub fn get_unit_from_unit_string(unit_name: String) -> Result<Unit, LabeledError> {
    let unit = match unit_name.as_ref() {
        "year" | "years" | "yyyy" | "yy" | "yr" | "yrs" => Ok(Unit::Year),
        "month" | "months" | "mth" | "mths" | "mm" | "m" | "mon" => Ok(Unit::Month),
        "day" | "days" | "dd" | "d" => Ok(Unit::Day),
        "week" | "weeks" | "ww" | "wk" | "wks" | "iso_week" | "isowk" | "isoww" => Ok(Unit::Week),
        "hour" | "hours" | "hh" | "hr" | "hrs" => Ok(Unit::Hour),
        "minute" | "minutes" | "mi" | "n" | "min" | "mins" => Ok(Unit::Minute),
        "second" | "seconds" | "ss" | "s" | "sec" | "secs" => Ok(Unit::Second),
        "millisecond" | "ms" | "millis" => Ok(Unit::Millisecond),
        "microsecond" | "mcs" | "us" | "micros" => Ok(Unit::Microsecond),
        "nanosecond" | "ns" | "nano" | "nanos" => Ok(Unit::Nanosecond),
        _ => {
            return Err(LabeledError::new(
                "please supply a valid unit name to extract from a date/datetime. see dt part --list for list of abbreviations.",
            ))
        }
    };

    unit
}

Also note that I'm forcing weeks here, but some may find it odd, as we've discussed prior. For my code here, I'm dictating a standard abbreviation, but when parsing (above) I try to be more forgiving and allow anything that could reasonably be considered as an abbreviation.

fn create_nushelly_duration_string(span: jiff::Span) -> String {
    let mut span_vec = vec![];
    if span.get_years() > 0 {
        span_vec.push(format!("{}yrs", span.get_years()));
    }
    if span.get_months() > 0 {
        span_vec.push(format!("{}mths", span.get_months()));
    }
    // if we have more than 6 days, show weeks
    let days_span = span.get_days();
    if days_span > 6 {
        let weeks = span.get_weeks();
        if weeks == 0 {
            let (weeks, days) = (days_span / 7, days_span % 7);
            span_vec.push(format!("{}wks", weeks));
            if days > 0 {
                span_vec.push(format!("{}days", days));
            }
        } else if span.get_days() > 0 {
            span_vec.push(format!("{}days", span.get_days()));
        }
    } else if span.get_days() > 0 {
        span_vec.push(format!("{}days", span.get_days()));
    }
    if span.get_hours() > 0 {
        span_vec.push(format!("{}hrs", span.get_hours()));
    }
    if span.get_minutes() > 0 {
        span_vec.push(format!("{}mins", span.get_minutes()));
    }
    if span.get_seconds() > 0 {
        span_vec.push(format!("{}secs", span.get_seconds()));
    }
    if span.get_milliseconds() > 0 {
        span_vec.push(format!("{}ms", span.get_milliseconds()));
    }
    if span.get_microseconds() > 0 {
        span_vec.push(format!("{}µs", span.get_microseconds()));
    }
    if span.get_nanoseconds() > 0 {
        span_vec.push(format!("{}ns", span.get_nanoseconds()));
    }

    span_vec.join(" ").trim().to_string()
}

We "borrowed" liberally from chrono-humanize-rs as inspiration and kind of rolled our own but we also support a date humanize command.

❯ '2019-05-10T09:59:12-07:00' | date humanize
5 years ago
❯ (date now) - 2019-05-10T09:59:12-07:00
275wk 5day 22hr 34min 20sec 879ms 101µs

So, any duration automatically is expressed in a nushell-humanized nomenclature. I'm not satisfied with it, or our other datetime handling, which is why I'm here in this repo. 😄

I think friendly or human, humantime, humanize all express viable naming options.

BurntSushi commented 3 months ago

@fdncred Thanks for the feedback! I'm not sure about some of those unit designators. Something like 5 yyyy reads very weird to me hah. But there are some in there that probably make sense to add, like wks.

Out of curiosity, how does nushell deal with locale? That is honestly my biggest hesitation with something like this. ISO 8601's duration format and Temporal's ISO 8601 datetime format are interchange formats. They are "human readable," but their primary purpose is in the exchange of data in an agreed upon format between different systems. Otherwise, Temporal punts internationalization to another TC39 group. This is also why, AIUI, Temporal doesn't support something like strptime. And AIUI, strftime is insufficient for correct internationalization. But I added strptime and strftime because they are just so incredibly useful. That's also why I want to add this new "friendly" format as well, because in practice, folks just want an easy way of accepting and printing durations that humans can easily interact with. Assuming you're find assuming English and a Western/Gregorian understanding of date-keeping.

On the flip side, I don't necessarily want to tie one hand behind my back and resist offering useful APIs for English speaking folks using the Gregorian calendar just because they aren't universally applicable. And on the other flip side, I don't want to go down the enormous rabbit hole of internationalization either. It's just not a problem I want to spend the next ~months/years of my life working on. And still yet, I don't want to do a "half baked" solution where you can just configure the strings used for "year" or whatever.

One thing I tend to fall back on here is that ISO 8601 itself encodes the notion of a Gregorian calendar and Western time keeping. So by doubling down on English unit names, we aren't really do anything more than what ISO 8601 already assumes. But this "friendy" format is stretching it a little bit by allowing spelled out words like "year" instead of just the Y designator.

fdncred commented 3 months ago

Something like 5 yyyy reads very weird to me hah

lolz! I mainly like just covering the sql abbreviations, singular and plurals, and then some really abbreviated variations

Out of curiosity, how does nushell deal with locale?

As we've discussed prior, I'm not a fan of how nushell handles datetime. However, I tend to think about it this way. When a user is expressing a date/datetime in the repl, they're usually subconsciously thinking about their own locale/time zone. People don't usually think about time in anyone else's time zone. However, when they do, they can provide the offset. Once they provide an offset, it's much clearer what they're saying. So, we either assume local, or assume their provided offset.

internationalization

Big ugh! Lots of work there. For nushell, rightly or wrongly, we've standardized on English. We have contributors across the world, and no one really complains about it. We do have some strange issues sometimes from Chinese folks, from time to time, but that's usually due to bytes/chars/grapheme stuff.

I think your ISO 8601 argument is valid and I personally see "friendly" easily falling into that without issue or complaint.

I have two pieces of unsolicited friendly advice to you on your "half baked" comments. (not that your comments are half baked but the comments where you mention half-baked, lol) They have kind of become my mantra with nushell.

Don't let good be the enemy of great. I'm sure you've seen that before but what I specifically mean by it here is that you've already taken the first step in releasing your crate even though it's not 100% perfect. People can use it and rely on it and continue to make it better. It's better to run with something and check for community response than have a stressful inward dialog with yourself about what you need to do to make it "great". It's software. We can always change it.
Perfection is not available at any cost. On top of that, what you consider perfect is never what everyone considers perfect.

BurntSushi commented 3 months ago

As we've discussed prior, I'm not a fan of how nushell handles datetime. However, I tend to think about it this way. When a user is expressing a date/datetime in the repl, they're usually subconsciously thinking about their own locale/time zone. People don't usually think about time in anyone else's time zone. However, when they do, they can provide the offset. Once they provide an offset, it's much clearer what they're saying. So, we either assume local, or assume their provided offset.

Oh, sorry, I meant locale as in internationalization. In Temporal's case, there is nothing in its API that lets you print or parse strings like July or Saturday. That's where Jiff diverges from Temporal, although only a very small corner of Jiff does this. The vast majority of Jiff is "independent" of internationalization problems and this was very much intentional. But the strtime APIs are an example of where we start heading into areas that internationalization might have a role to play. And this "friendly" format doubles down and extends it more.

And oh yeah, I've been uttering "don't let perfection be the enemy of the good" for a long time now. I'm confident I won't fall into that trap. But I also want to be sensitive to bias and implicit power structures. If I'm building a datetime library and that datetime library gets popular, but it specifically encourages English in places and eschews all other languages and calendar systems, then how big of a problem is that? Anyway, I don't mean to say that this must prevent me from moving forward with formats like this, but that it's something that's on my mind that I weigh as a trade-off.

I'd love advice from an internationalization expert to be honest. In particular about the trade-offs involved. Like, Jiff won't be the first library to offer a "human friendly" duration format like this. Others have done it before in different ways. How big of an issue has it been if it doesn't support internationalization? And if we did want to support internationalization, is it sufficient to "just" provide a way to override the unit designators with different strings? (I think it isn't, but I don't really know for sure.) Or is that something that helps a lot and doesn't hurt? Or is it a half-baked thing that actually ends up hurting more than it helps?

fdncred commented 3 months ago

Oh, sorry, I meant locale as in internationalization.

Oops. My fault. I have a fixation on datetime atm. With nushell, I think the only thing we do with localization is determine what type of separator to use for thousands and decimals. Everything else is just English, iirc. If crates we use support strftime things with localization we don't interrupt it.

How hard is it? I'm no expert, but just figuring out the locale and changing the minimal things we did for decimals and thousands separators was a pain. Edge cases just drive me crazy sometimes. You think you have it right, but some funky thing breaks your software. Ugh!

BurntSushi commented 3 months ago

How hard is it?

That's part of the problem. My understanding is that full and correct support (in accordance with relevant Unicode specs) is an enormous under-taking. And so it's best left to crates like icu to do it. But there are in theory things I could do that aren't as complicated, like allow the caller to substitute in their own unit designator strings. But I don't know if that's helpful or actively harmful.

keltia commented 2 months ago

I was just going to ask about humantime support but I see you are already on it :) Thanks.

PS: love the rount(Unit) BTW.

ChristopherRabotin commented 2 months ago

Andrew, you may wish to take inspiration from hifitime here because our parser of durations is no-std (using the lexical crate), and supports rounding based on the unit: https://github.com/nyx-space/hifitime/blob/master/src/duration/parse.rs#L15 .

Note that I'm currently working to support Durations down to the zeptosecond (1e-21) for hifitime, so this functionality is bound to be expanded upon.

BurntSushi commented 2 months ago

@ChristopherRabotin That appears to be just about a subset of the grammar I wrote above. I see only a few main differences making it not a proper subset:

In the absence of unit designators or things like :, your format interprets an integer, I think, as a number of seconds. I intentionally decided to not support that in the grammar above due to the ambiguity it can cause. The [-+]H:M:S(.[0-9]+)? is supported in the grammar above though.
In my gramar, fractional units are only allowed for units of hours or lower.
In my grammar, fractional units, when present, terminate the duration. So, for example, 1.5 hours 30.5 minutes isn't allowed by the grammar above.

Otherwise, my grammar above also supports units of years, months and weeks.

The closest analog to hifitime's Duration type is probably Jiff's SignedDuration, and in that context, Jiff specifically forbids the concept of "days" because days are non-uniform units in the context of time zones. To express "days," Jiff requires you to use a Span, which is more like a bag of units instead of one single absolute duration.

tisonkun commented 2 months ago

Ref - https://github.com/piperRyan/rust-postgres-interval/

I wonder if other parse / format options are in the scope of this library. That we can parse from string literal in other format like PostgreSQL's, and print in that format.

BurntSushi commented 2 months ago

Well from that README, that format at least is supported by the grammar I posted above.

If one needs a parser specific to something like PostgreSQL, I feel like that's niche enough that you should write your own or publish a crate dedicated to it.

tisonkun commented 2 months ago

humantime has format_duration also.

The SignedDuration default print format is ISO-8601, which is a bit hard to read for most of users.

Somehow I can convert SignedDuration to std Duration and then call humantime's format_duration. But since this issue discusses about support a new duration format inspired by humantime, I wonder if we add both parse and format, or only the parse part. And if both, do we have existing method to invoke the related logic now?

UPDATE - I noticed that the debug format of SignedDuration seems "human readable", while the Display format isn't (looks like reverse?).

    println!("{}", jiff::SignedDuration::from_str("PT1m").unwrap()); // PT1m
    println!("{:?}", jiff::SignedDuration::from_str("PT1m").unwrap()); // 60s

Well. I may understand that debug is for readonly, while the display impl may follow a invariant parse(format(v)) == v, and we don't support parse 60s things now.

BurntSushi commented 2 months ago

@tisonkun You marked your comment as resolved, but to elaborate...

Yes, Display and FromStr are indeed intended to be duals of one another for both SignedDuration and Span. My intent is that those will continue to use ISO 8601 durations (technically, Temporal ISO 8601 extension) for the purposes of interoperability. While I agree it is not easy to read, it is not just a de facto standard but the standard interchange format for durations. That elevates it to a point where it really ought to be the default I think.

The "friendly" format outlined here is just something I made up. Well, I mean, I didn't invent the idea from whole cloth. But I did my best to formalize a grammar that should parse most English human readable durations, including those emitted by humantime. And yes, it is intended for jiff::fmt::friendly to have both a printer and a parser. In fact, I'm just about done implementing the printer.

My loose plan at this point is:

Add a new jiff::fmt::friendly module that largely follows the structure of other jiff::fmt modules. That is, a bit more verbose to use, but it's the lower level API that provides the most configuration knobs and flexibility.
Make the Debug impls for SignedDuration and Span use jiff::fmt::friendly. Indeed, making Span's Debug impl not use ISO 8601 is an explicit goal in order to resolve #60. The status quo is that multiple distinct Span values in memory can have the same Debug output, and this is really bad for programmer ergonomics when debugging.
The FromStr and Display impls for SignedDuration and Span remain as they are today, for the reasons stated above.
Likely add new "friendly" methods to SignedDuration and Span. For example, Span::parse_friendly and Span::to_friendly, or something like that.

Another thing I'm thinking about is:

Make FromStr on SignedDuration and Span accept either an ISO 8601 duration or a "friendly" duration. I think this is possible because ISO 8601 durations have to start with a P. A "friendly" duration has to start with either a sign or a digit. So there should be no ambiguity.
Make "friendly" an "alternative" Display impl, e.g., {:#}.

If I did those two things, then we wouldn't need any new "friendly" APIs on Span and SignedDuration. But I'm not fully sold on that approach yet. It does feel a bit subtle.

scotow commented 1 month ago

Looking forward to this new feature and be able remove conversion functions and humantime from our dependencies. Will this format be added to the serde module too?

BurntSushi commented 1 month ago

Yeah servicing the use cases that humantime does (but with more correctness) is definitely the intent.

I haven't quite figured out how I'm going to do serde integration yet. But yeah, I do expect we should do something there. The Deserialize case will be easy because there isn't a ton of configuration there. But the Serialize case is harder because there's tons of different ways to express a duration given the above grammar. I suppose we should at least provide a way to use the default format (whatever that ends up being, I guess probably something that humantime can parse).

BurntSushi / jiff

add a new "friendly" duration format #111